


Indian Agricultural 
RE aEARCH Institute, New Delhi 


I.A.R 1.6. 

aiP NLK—H 3 1.A.R.r.-10-5 55—13.000 




THE ANNALS 

of 

MATHEMATICAL 

STATI^ICS 

(founde^by u. c. cakver) 

The Official Journal of the Institute of 
Mathematical Statistics 


VOLUME XX 




CONTENTS OF VOLUME XX 


Abticles 

Anderson, T. W. and Herman Rubin. Estimation of the Parameters of a 

Single Equation in a Complete System of Stochastic Equations. 46 

Andrews, F. C. and Z. W. Birnbaiim. On Sums of Symmetrically Truncated 

Normal Random Variables . 458 

Baker, G. A. The Variance of the Proportions of Samples Falling Within 

a Fixed Interval for a Normal Population. 123 

Banerjee, K. S. a Note on Weighing Design. 300 

Bancroft, T. A. Some Recurrence Formulae in the Incomplete Beta Func¬ 
tion Ratio. 451 

Barankin, E. W. Locally Best Unbiased Estimates. 477 

Berger, Agnes and Abraham Wald. On Distinct Hypotheses. 104 

Birnbaum, Z. W. and F. C. Andrews. On Sums of Symmetrically Truncated 

Normal Random Variables. 458 

Birnbaum, Z. W. and H. S. Zuckerman. A Graphical Determination of 

Sample Size for Wilks’ Tolerance Limits. 313 

Blom, Gunnar. a Generalization of Wald’s Fundamental Identity.439 

Boas, R. P., Jr. Representation of Probability Distributions by Charlier 

Series . 376 

Bose, R. C. A Note on Fisher’s Inequality for Balanced Incomplete Bloctk 

Designs.619 

Chernoff, Herman. Asymptotic Studentization in Testing of Hypotheses. 268 

David, Herbert T. A Note on Random Walk. 603 

Doob, J. L. Heuristic Approach to the Kolmogorov-Smirnov Theorems. . 393 
Dvoretzky, Aryeh. On the Stiong Stability of a Sequence of Events... 296 

Dwyer, Paul S. Pearsonian Correlation Coefficients Associated with Least 

Squares Theory. 404 

Epstein, Benjamin. A Modified Extreme Value Problem. 99 

Epstein, Benjamin. The Distribution of Extreme Values in Samples Whose 

Members are Subject to a Markoff Chain Condition. 590 

Erdos, P. On a Theorem of Hsu and Robbins. 286 

Godwin, H. J. A Note on Kac’s Derivation of the Distribution of the Mean 

Deviation. 127 

Godwin, H. J. Some Low Moments of Order Statistics. 279 

Goodman, Leo A. On the Estimation of the Number of Classes in a Popula¬ 
tion. 572 

Greenwood, Robert E. Numerical Integration for Linear Sums of Ex¬ 
ponential Functions. 608 

Grubbs, Frank E. On Designing Single Sampling Inspection Plans.... 242 
Halmos, Paul R. and L. J. Savage. Application of the Radon-Nikodym 

Theorem to the Theory of Sufficient Statistics. 225 

iii 





























JV 


VOLUME INDEX 


Hansen, Morris H. and William N. Hurwitz. On the Determination of 

Optimum Probabilities in Sampling. 426 

Hatke, Sister Mary Agnes. A Certain Cumulative ProbabiUty Func¬ 
tion. 461 

Hoel, P. G. and R. P. Peterson. A Solution to the Problem of Optimum 

Classification. 433 

Horton, H. Burke and R. Tynes Smith, III. A Direct Method for Pro¬ 
ducing Random Digits in Any Number System. 82 

Howell, John M. Control Chart for Largest and Smallest Values. 305 

Hurwitz, William N. and Morris H. Hansen. On the Determination of 

Optimum Probabilities in Sampling. 426 

Kimball, Bradford F. An Approximation to the Sampling Variance of an 
Estimated Maximum Value of Given Frequency Based on Fit of 

Doubly Exponential Distribution of Maximum Values. 110 

Lehmann, E. L. and C. Stein. On the Theory of Some Non-Parametric 

Hypotheses. 28 

Lev, Joseph. The Point Biserial Coefficient of Correlation. 125 

Levene, Howard. On a Matching Problem Arising in Genetics. 91 

Matern, Bbrtil. Independence of Non-Negative Quadratic Forms in 

Normally Correlated Variables. 119 

Madow, William G. On the Theory of Systematic Sampling, II. 333 

McMillan, Brockway. Spread of Minima of Large Samples. 444 

Mood, A. M. Tests of Independence in Contingency Tables as Uncondi¬ 
tional Tests. 114 

Noether, Gottfried E. On a Theorem by Wald and Wolfowitz.455 

Olds, Edwin G. The 5% Significance Levels for Sums of Squares of Rank 

Differences and a Correction. 117 

Otter, Richard. The Multiplicative Process. 206 

Paulson, Edward. A Multiple Decision Procedure for Certain Problems 

in the Analysis of Variance. 95 

Peiser, a. M. Correction to “Asymptotic Formulas for Significance Levels 

of Certain Distributions”. 128 

Peterson, R. P. and P. G. Hoel. A Solution to the Problem of Optimum 

Classification. 433 

Pitman, E. J. G. and Herbert Robbins. Application of the Method of 

Mixtures to Quadratic Forms in Normal Variates. 552 

Quenouillb, M. H. Problems in Plane Sampling. 355 

Quenouille, M. H. The Joint Distribution of Serial Correlation Coef¬ 
ficients .;. 561 

Reich, Edgar. On the Convergence of the Classical Iterative Method of 

Solving Linear Simultaneous Equations. 448 

Riordan, John. Inversion Formulas in Normal Variable Mapping. 417 

Robbins, Herbert and E. J. G. Pitman. Application of the Method of 
Mixtures to Quadratic Forms in Normal Variates. 552 





























VOLUME INDEX 


V 


Rubin, Herman and T. W. Anderson. Estimation of the Parameters of a 

Single Equation in a Complete System of Stochastic Equations. 46 

Savage, L. J. and Paul R. Halmos. Application of the Radon-Nikodym 

Theorem to the Theory of Sufficient Statistics. 225 

Sard, Arthur. Smoothest Approximation Formulas. 612 

Seth, G. R. On the Variance of Estimates. 1 

SoBEL, Milton and Abraham Wald. A Sequential Decision Procedure for 
Choosing one of Three Hypotheses Concerning the Unknown Mean of 

a Normal Distribution. 502 

Smith, R. Tynes, III and H. Burke Horton. A Direct Method for Pro¬ 
ducing Random Digits in Any Number System. 82 

Stein, C. and E. L. Lehmann. On the Theory of Some Non-Parametric 

Hypotheses. 28 

Tukey, John W. Sufficiency, Truncation and Selection.309 

Tukey, John W. Moments of Random Group Size Distributions. 523 

VON ScHELLiNO, HERMANN. A Formula for the Partial Sums of Some Hyper¬ 
geometric Series. 120 

Wald, Abraham and Agnes Berger. On Distinct Hypotheses. 104 

Wald, Abraham. Statistical Decision Functions. 165 

Wald, Abraham. Note on the Consistency of the Maximum Likelihood 

Estimate. 595 

Wald, Abraham and Milton Sobel. A Sequential Decision Procedure for 
Choosing one of Three Hypotheses Concerning the Unknown Mean of 

a Normal Distribution. 502 

Walsh, John E. Some Significant Tests for the Median which are Valid 

Under Very General Conditions. 64 

Walsh, John E. On the Range-Midrange Test and Some Tests with 

Bounded Signihcance Levels. 257 

Walsh, John E. On the Power Function of the “Best” t-test Solution of the 

Behrens-Fisher Problem. 616 

Walsh, John E. Concerning Compound Randomization in a Binary Sys¬ 
tem. 580 

WoLFOWiTZ, J. On Wald’s Proof of the Consistency of the Maximum Likeli¬ 
hood Estimate. 601 

WoLFOwiTZ, J. The Power of the Classical Tests Associated with the Normal 

Distribution. 540 

Woodbury, Max A. On a Probability Distribution. 311 

Yosida, K6saku. Brownian Motion on the Surface of the 3-Sphere. 292 

ZucKERMAN, H. S. AND Z. W. Birnbaum. A Graphical Determination of 
Sample Size for Wilks’ Tolerance Limits. 313 

Miscellaneous 

Abstracts of Papers.130, 317, 464, 620 

Constitution and By-Laws of the Institute. 327 




























VI 


VOLUME INDEX 


Election of Officers and Council and Revision of By-Laws. 150 

News and Notices.144, 321, 470, 624 

Report on the Berkeley Meeting of the Institute. 475 

Report on the Boulder Meeting of the Institute. 626 

Report on the Cleveland Meeting of the Institute. 152 

Report on the New York Meeting of the Institute.325 

Report on the Seattle Meeting of the Institute. 151 

Report of the President of the Institute. 156 

Report of the Secretary-Treasurer of the Institute. 160 

Report of the Editor. 163 















ON THE VARIANCE OF ESTIMATES 

By G. R. Seth 
Columbia University 

Summary. In this paper recent results on the lower bound to the variance 
of unbiased estimates have been brought together. Some of them have been 
extended to sequential estimates and the others have been improved to some 
extent. In the last section a general method for generating a system of orthog¬ 
onal polynomials with respect to a certain class of weight functions is obtained 
together with a result on the conditions under which the class of unbiased esti¬ 
mates formed by all functions of an unbiased estimate consists of just one element. 

1. Introduction. 

§1.1. Let Zi, X 2 • • • be a sequence of chance variables whose distribution 
depends upon an unknown parameter 8 and possibly also a finite number of other 
parameters. It is assumed that either all the X’s are absolutely continuous or 
that they are all discrete. Let puixi ,xi, • • ■ ,xu ; 8 ) denote the joint probabil¬ 
ity density function or the probability of (Xi, • • • , -Yj^) according as the X'b 
are continuous or discrete. Let 6 *(xi , arj, • • • , x„) be an unbiased estimate of 
8 , where xi, Xi, • • • , x„ is a sequence of observations on Xi, Zi, • • • , X„ . 

In this paper, we shall make use of the following short forms and abbrevia¬ 
tions: 

E(X) will represent the e.xpectation of X. 

<r‘(X) will represent the variance of X. 

E(y I x) will represent the conditional expectation of y, given x. 

8 * will represent an abbreviation of 8 *(xi , xz, • • • , x»). 

/ will represent an abbreviation of f(x; 8 ) or f(x; 81 , 82 , • • • , 8 t)- 

p„ will represent an abbreviation of p„(xi , X 2 , • ■ • , x„ ; 8 ) or p»(xi, X 2 , • • • , 

Xn > 81 , 82 ,•• • , 8 t)- 

Ps will represent pn for a fixed size sample, i.e., n = N. 
g will represent an abbreviation of g{ 8 *-, 8 ) or^(0t, 8 *,“- , 8 *‘, 81 , 82 , • • • , 8 t). 
h will represent , $ 2 , • • • , {w-i | 8 *; 8 ) or A((i, $ 2 , • • • , ^it-r | 6 *, «*,•••, 

8 t; 82 , 82 ,--- 8 t). 

... p. ■ 

2 0ti+»2+**-+»r 

' ’ g 381 ' 881^■■■887 

2 0»l+*2+ ••+*7' 

will represent -r -r- h. 

hd8{'887^^^887 

In case differentiations with respect to one parameter are involved, the last 
three abbreviations will be shortened to <t>i{n), gi and hi respectively. 

1 



2 


G. II. SETH 


In §1.1, n is assumed to be a constant equal to N, that is, the sequence of chance 
variables is finite and fixed, consisting of Xi , Xo, Xz j • • • j Xx . 

Cramer [1] and Rao [2] have shown that under certain conditions of regularity, 
the variance of 6*{xi, Xo, ' •' , Xn) satisfies the inequality: 


( 1 . 1 . 1 ) 


0^d*{xi, X2, * • • , Xx) > 


1 


E 



Cramer [1] has shown that the lower bound for the variance of 
0 *{xi jX 2 , • • • , x^) given by (1.1.1) is achieved if and only if: 

(1.1.2) . There exists a sufficient statistic for estimating 6 , 

(1.1.3) . The probability distribution g(d*; 6) of the sufficient statistic 
6 *{xi , 0 ^ 2 , • * • , Xx) is of the form 

K d 

0*ixi ,X 2 , ■■■ ,Xn) - e = - 77 *-^-. g(6*;6), whenever gid*; d) > 0, 

<j{0 ; 0 ) otf 


where K depends only upon N and the parameters in the distribution. 

Cramer calls the statistic d*(xi, x-i, ■ ■ ■ , x^) satisfying (1.1.2) and (1.1.3) 
an “efficient” statistic estimating d and we will use the word “efficient” in this 
sense alone. Bhattacharyya [3] has shown that there exists a lower bound to the 
variance of 0 *(xi, xi, • • ■ , Xs) which is higher than or equal to the one given in 
(1.1.1). This lower bound is (mjX”, that is, 

(1.1.4) <T\e*{Xl ,Xi,--- , Xs)) > (m)X“ 


where 

II = iix.7ir, 

and 

where m is any positive integer. 


• •, m. 


Let 6 consist of T components , ^ 2 , • * • , , and px{xi, X 2 y — • yXx ] St) 

N 

be the same as H f{xi; 0i, 02, • • • , 0r). Further let 6t{xi, X 2 , • • • , Xx), 

02 (^ 1 , , • • • , ojat), • • • , 07 ’(.Ti , X 2 , Xx) be unbiased estimates of 0i, 02, 
•*., Ot respectively, with the non-singular covariance matrix ||Ft/ll 
(i, j = 1, 2, • • • , T). Cramer [4] has proved that under certain regularity con¬ 
ditions, the ellipsoid 

(1.1.6) E V'Uit, = T + 2 


contains within itself the ellipsoid 

(1.1.7) E = T+ 2, 



where 

( 1 . 1 . 8 ) 

and 


VARIANCE OF ESTIMATES 


3 


lir'ir = iiF.-,-iL 


(1.1.9) 



This result is also implicitly contained in Rao [ 2 ]. 

§1.2. Let us now take n as a chance variable determined by a sequential pro¬ 
cedure. Xi , A "2 , .Y 3 , • • • is a sequence of chance variables having the same 
probability density or probability/(.r; d), according as X is absolutely continu¬ 
ous or discrete. The sequential p’-ocess tells us, after each successive observa¬ 
tion has been drawn, whether the next observation is to be taken or not. Thus 
n will denote the total number of observations taken by the time the sequential 
process has been completed. Under certain regularity conditions, Wolfowitz 
[5] has shown that if 0'^{xi , X 2 , • • • , .t„) is an unbiased estimate of 0, then 


( 1 . 2 . 1 ) 


X2, • • - jXn) > 


i;n.£(Alog/(x;0)J 


Furthermore, if 6 consists of T components, di , O 2 , ■ • • , 6 t , and 
^*(•^ 1 , Xi, , x„), di{xi,Xi, ••• , z„), • • • , drixi ,Xi,---,Xn) are unbiased 
estimates ol d\, 62 , • ■ • , 67 respectively, Wolfowitz [5] has proved that 

T 

(1.2.2) Z liiUti = T + 2 


is contained within the ellipsoid 

(1.2.3) Z V'Uit, = T + 2, 


where 


L, 


En-E 


/a log / 9 log A 

V de. dOi /’ 


iyj = 1 . 


T. 


Blackwell and Girshick [ 6 ] have shown that the lower bound given by ( 1 . 2 . 1 ) 
for the variance of an unbiased estimate of $ is attained only for the sequential 
process for which Pr(n = N) = 1, if the probability density function/(x; 6 ) of 
X is such that E(X) = 9 and Xi + xj + 2 : 3 , • ■ • + x^ is a sufficient statistic for 
all integral values of M, for estimating 0; Xi, X 2 , • • • , Xm being M independent 
observations on the chance variable X. 

In this paper the following results have been obtained. The specific condi¬ 
tions under which the results hold are stated at their proper places along with 
the results: 

(1.3.1) The lower bound in (1.1.4) is valid when n is considered a 



4 


G. R. SETH 


chance variable determined by a sequential procedure instead of being a fixed 
number N. 

(1.3.2) The concentration ellipsoid defined in (1.2.3) contains within itself 
another ellipsoid 

T 

22 = r + 2 

t.j-i 


where im is given by (3.1.18), which in turn contains the ellipsoid given by 

(1.2.2). 

(1.3.3) . The Blackwell and Girshick result [6] for the achievement of the lower 
bound for the variance of unbiased estimates given by (1.2.1) has been extended 

a 

to the case where the probability density (or probability) U f{xi ; 6 ), for all 

*•■1 

fixed M> N, where N is the least value for which Pr(« = iV) =t= 0, has an 
unbiased “eflScient” estimate for 6 in the sense defined by Cramer. This is 
illustrated by two examples of Wald sequential procedures. 

(1.3.4) . Let N be fixed and pif(xi, zi, ■ • ■ , Z/r; d) • | 7 | = ff(0*; 0) 

> fe > • * • » iif-i 1 ®)) where J denotes the Jacobian of the transfor¬ 

mation from Zi, zt, •••, Zn to d*, ii, { 2 , • • • , fv-i. Here giO*; 0), and 

>&»•••> I 0 *', 0) are respectively the probability density function (or 
probability) of 0 * and the conditional probability density function (or prob¬ 
ability) of { 1 , , • • • , for a given value of 0 *. 

The necessary and sufficient conditions under which the lower bound for the 
variance of unbiased estimates given by Bhattacharyya [3] may be achieved are 
that there should exist a statistic 0 *(xi ,Z 2 ,-“, Zy) such that: 

(a) hi, hi, • • • , hm are linearly dependent considered as functions of { 1 , 
{$,•••, for given values of 0 and 0 *{xi ,Z 2 , • • • , Zs) and 

(b) the probability density g(0*; 0) of 0 *(zi, zt, • • • , zy) satisfies the follow¬ 
ing equation: 


0*(Xl, Z2, , Zs) 


Ki d' 


hg{0*l0) 


where Ki are independent of the a:i, 0 : 2 , xj, • • • , xjv . 

Equivalent conditions for the multiparameter case have also been given. 

(1.3.5) . The following properties of <l>i(n), • • • are derived: 

(a) Under certain conditions <t> 2 {N) • • • form a system of orthogonal 

polynomials in the weight function being pi/(zi, Z 2 , • • • , zy ; 0). 

m 

(b) 2 Kiii>i{n) cannot be a function of Xi, X 2 , • • • , x„ , independent of 0 

<-i 

except for the constant zero. 

(c) If 0*{zi, Z 2 Zn) is linearly dependent upon <t>iin), then no other 
statistic except of the form a0*(zi, Z 2 , • • ■ , z„) + b where a and b are 
constant independent of 0, can be linearly related with 0i(n). 

(1.3.6) . If a) 0*(zi ,Z 2 ,”-, zn) is an unbiased estimate of 0 and b) if among 



VARIANCE OF ESTIMATES 


5 


all functions of 6 *{xi, X 2 j • • • , Xn) which are unbiased estimates of 6 with finite 
variance, B* is the one with the least variance and such that the set of poly¬ 
nomials with respect to the distribution function of 6 * is complete, then there is 
no function of B* having a finite variance which is an unbiased estimate of B. 

2. Estimation of a single parameter. 

§2.1. Let Xi, X 2 , • • • and Pm(xi , X 2 , • • • Xat ; 0) be as given in the first para¬ 
graph of (1.1). Let Q be the space of all possible infinite sequences (w) of obser¬ 
vations Xi, X 2 y • • • . Let there be given an infinite sequence of Borel measur¬ 
able functions ^iCxi), 4>2(xi, X 2 ), • • • , ^jixi , T 2 , xs, • • • , X;) • • • , defined for 
all observable sequences in such that each takes only the values zero and one. 
We further assume that everywhere in Q, except possibly on a set whose proba¬ 
bility is zero for all B under consideration at least one of the functions ^i(xi), 
$ 2 (xi, X 2 ), • • • takes the value of one. Let n be the smallest integer for which 
this occurs. Thus n(a>) is a chance variable. The sequential process is then 
defined as follows: 

Take an observation and find <J>i(xi). If it is unity, the sampling process stops; 
otherwise continue sampling. If a second observation is taken and the value of 
^ 2 (xi, X 2 ) is unity, the process stops; otherwise continue sampling, and so on. 
In general, if after taking j observations 

^i(xi, X 2 , • • • , x.) = 0 for < = 1, 2, • • • y - 1, 

and ^,(xi, X 2 , • • • , X,) == 1, sampling stops; otherwise it is continued. We 
will denote by /2y, the set of all points (xi, X 2 , • • • , xy) for which the process 
stops with the jth observation. 

Let 0*(xi, X 2 , • • • , x„) be a statistic whose expectation is a real valued func¬ 
tion y{B) of B. The development proceeds on the assumption that 
Vm{xi , X 2 , • • • y Xm] 0) is a probability density function. The result is equally 
valid if Pa#(xi , X 2 , • • • , Xat ; ^) is the probability of discrete variables Xi , 
^ 2 , • ’ * y Xm provided that integration is replaced by summation whenever 
this is required. Further the phrase ^‘almost all points” in a Euclidean space of 
any finite dimensionality is understood to mean all points in the space with the 
following possible exceptions: 

(a) . A set of Lebesgue measure zero where Pm(xi , xo , • • • , xif; ^) is the prob¬ 
ability density function; 

(b) . The points which belong to the set Z, where Pm{xi , X 2 , • * • ,Xm]B)\s the 
probability function of the discrete chance variables Xi, X 2 , • • • , Xm • The 
set Z consists of all points (xi, X 2 , • • • , Xif) such that p.v(xi, xj, • • • , Xif; 0) = 
0 identically for all B under consideration. 

§2.2. Conditions of regularity. We will postulate the following conditions to 
be satisfied by Pm(xi y X 2 y • • • , Xm ;B) and ^*(xi, X 2 , • • • , Xn). 

(2.2.1). B*(xi , X 2 , • • • Xn) has an expectation y(B) and a finite variance. All 
the derivations of y(B) are assumed to be finite. The parameter B lies in an open 
interval D of the real line. D may consist of the entire line or an entire half line. 



6 


G. H. SETH 


(2.2.2) . The derivatives 

, (i = 1, 2, • • •, m), 

exist for all d 'nxD and almost all xi, a:», • • • , 3:^ in and for all M. We define 

Jl = n 

Vm d0‘ 

whenever pu(xi, Zi, • • ■ , Xm ; 0 ) =0; thus, 

- 

Pm 

is defined for all 6 in D and almost all (xi , .r 2 , * • • , Xm) \n Rm • 

(2.2.3) . For any integral j there exists non-negative L-measurable functions 
Tt(xi , 0 : 2 , • • • , Xj), (i = 1, 2, • • • , m), such that 

d * 

(a) i 0*(xi, X 2 , , x,)—p,(xi, X 2 , • • ■ , Xj; 0) \ < Ti{xi, X 2 , • • • , Xj), 

for all ^ in D and almost all (xi , 0 : 2 , * * * , Xj) in Rj . 


(b) f T,ixi, Xi, ■ • ■ , X,) dxu , (i = 1, 2, • • • , to), 

jRj u-1 

are finite. 

(2.2.4). Let t,i0) = f 0*(xi, Xi, • •' , x,)pj(xi, Xi, ■■ ■ ,xr,0) II dxu. 

jRj u-1 


We postulate the uniform convergence of 

Z t,{ 0 ), (*■ = 1, 2, • • • , to) 

(V 

(the existence of is assured by the assumption (2.2.3).) 

(id' 

(2.2.5) . There exist functions St{xi , .r 2 , * • * , Xj) for every j, (i = 1,2, • • • , m), 

such that when d*(xi , xo ^ , Xj) and T^{xi , .^ 2 , * • * , xy) are replaced by 

unity and >S\(xi, X 2 , • • • , xy) respectively, conditions (2.2.3) and (2.2.4) still 
hold good. 

(2.2.6) . The covariance matrix of (i = 1, • • • , m) exists and is non¬ 
singular for almost all d in D and almost all (xi , X 2 , • • • , Xn). 

§2.3. Let us consider the sequential process mentioned in §2.1 and the func¬ 
tions 0*(xi, X 2 , • • • , Xn) and Pm(xi , X 2 , * • * , x.v ; 0 ) which satisfy the regularity 
conditions in §2.2. We will now find a lower bound for the variance of such es¬ 
timates. 

Let us examine 

F - E (^0*ixi ,Xt, •“ Xn) — y{0) — £ Ki<t>i(n)^ , 


(2.3.1) 



VARIANCE OF ESTIMATES 


7 


where Ki (i = 1, 2, • • • , m) are independent of (xi, a-j, • • • x„). Now (2.3.1) 
can be written as 


(2.3.2) 


where 


F = <7-^(d*(a:i ,Xi, , x„)) — 2^ Ki E0*(xi ,X 2 , • • ■ , x„)(^.(n) 

1-1 

m m 

+ 2y(0) E KiE<t>i{n) + E KiK,\i,, 


Xii = E(<t>^in)<|>j{n)) 


ihj = 1,2, ,»?)• 


(2.3.4) E{B*{xi ,X 2 , •••, x„)<t>,(n)) = E f e*{xi ,X 2 , ■■■ ,x,) II i 

j-1 JRf OO u-I 


We also know that 


(2.3.5) E f e*(xi ,X2,---, xi)Vi n dx„ = y{e). 

1 JRj w =*1 

Differentiating both sides of (2.3.5) i times (f = 1, 2, • • • , m) we have, be¬ 
cause of conditions (2.2.3) and (2.2.1): 

(2.3.C) E [ 0*(xi ,X 2 , • • •, X,) n dxu = , (i = 1, 2, • • •, m). 


{i = 1, 2, • • •, m). 


From (2.3.1) and (2.3.6), we obtain 

(2.3.7) E{d*{xx ,Xi, •••, Xnhiin)) = 7 ( 0 ). 

Differentiating 

(2.3.8) 1 = £ f P>tldxu 

i times {i = 1 , 2 , • • • , w) with respect to 6, we obtain because of conditions 
(2.2.5) 

(2.3.9) 0 = E f II dxu, (f = 1, • • •, m). 


0 = E f “II d.r„, (f = 1, • • •, m). 

7-1 i<=il 


(2.3.8) is valid on account of the type of sequential process (2.1). Now 
(2.3.10) Ei<t>i{n)) = E f ^ Ildxu, (i = 1, • • •, m). 

7-1 JRj- Od^ n -1 

By (2.3.7) and (2.3.10), (2.3.2) reduces to 


(2.3.11) F = a\e*ixi , X 2 , • • • , Xn)) - 2 Z /C, + Z /v. /v, X,,. 

1-1 dS^ 1.7-1 

Now II Xi,- II being non-singular on account of condition (2.2.6), we get just 
one set of values of which minimize F. These values are given by 

E, - ^ (».)X , 


(2.3.12) 



8 


G. R. SETH 


where 

(2.3.13) II ir' = 11 X.y II, (i,i = 1, 2, • • •, m). 


Puttmg the above values of Kj(j = 1,2, • • • , w) in (2.3.11), we obtain 


(2.3.14) F ^ Ad*ixi,Zi, 


^ d'yie) d’yie) 

■ dd' ' do*' ■ 


Hence, F being non-negative by (2.3.1), we have 


(2.3.15) 


,Xi, • • • , x„)) > 


m 





d'yje) d^y(0) 
d$' dd’ 


Thus R.H.S. of the above inequality gives the lower bound to the variance of 
unbiased estimates of 7 (tf).* When y(d) = 0 , the above reduces to 

(2.3.16) a\e*(xi , X2, • • • , Xn)) > („)X”. 


When m = 1 and Pn(xi , Xo, * • • , Xn 


n 

; (2.3.16) reduces to 




(2.3.17) 


,X2, • • • , Xn)) > -77-T-, 


which is the result given by Wolfowitz [5], 

When n, the chance variable, is constant and equal to iV, then (2.3.15) and 

(2.3.16) correspond to those given by Bhattacharyya [3]. Although the con¬ 
ditions of regularity under which Bhattacharyya proves his results are not clear 
from his paper, they are likely to be slightly different from those in §2.3, as the 
results in [3] are obtained only for a fixed size sample. 

§2.4. We will now investigate the necessary and sufficient conditions under 
which the lower bound given in (2.3.16) is actually higher than that given in 

(2.3.17) . 

We can easily see that 

(2.4.1) f,n)X = r 7] p2 \ 9 


where / 2 i. 23 .. m is the multiple correlation coefficient between 0i(n) and 
<t>z{n)9 * • * , <t>m{n). 

The excess of the lower bound given by (2.3.16) over that when we use m = 1 
is given by 


(2.4.2) 


^ii(l 


1_. 


Xu 


' Under certain weak restrictions, an optimum lower bound to the variance of unbiased 
estimates has been obtained by me along the lines of a similar result for fixed size samples 
in an unpublished paper by A. Wald. Independently C. Stein has obtained the same result 
in a paper not yet published. 



which is further equal to 
(2.4.3) 


VARIANCE OF ESTIMATES 


9 


Xll(l — /2 i. 23 - m) 

Thus the lower bound for the variance of unbiased estimates of d is obtained 
by using m > 1 is higher than that obtained by employing m = 1 if and only if 
Ri 2 z - m is not zero for some m > 2. This is equivalent to the condition that for 
at least one i > 2, Xi<, the correlation coefficient between <^(n) and 0<(n) (i > 1), 
is different from zero. Suppose further that we have used m = a and that we 
wish to find the increase in the lower bound if a were replaced by a + 1. The 
increase in this case is given by 

(2AA) PUor.H).23...a 

Xll(l — /2l.23...(a+l)) 

where pi(a+i). 23 ...a is the partial correlation coefficient between and <t>a+i(n) 
keeping , 0a(n) fixed. It is greater than zero if and only if pi(a+i). 2 »...a 

is not equal to zero. 

§2.5. If Pn(xi, X 2 y • • • , Xr, ; 6i) also depends upon a finite number of other 
parameters ^ 2 , , * • * , , then a lower bound higher than or equal to that 

given in (2.3.16) can be obtained by using 

_ m 

2 instead of Ki^^<t>i^(n) m(2-3-l). 

The lower bound in this case is given by (3.1.14) (see section 3) by taking 8 = 1, 
that is, 

(2.5.1) <r*(0*(xi, 12, • • • , X,)) > C(l, 1) 

where C(l, 1) is the element in the first row and first column of the inverse of W 
defined in (3.1.9). 

The result for 7 i — N, N fixed, is obtained by Bhattacharyya [3, 1947]. Let 
us illustrate it by an example. Take samples of fixed size AT. Suppose we are 
required to find the lower bound to the variance of unbiased estimates of 6 i 
in the normal population 

(2.5.2) 

on the basis of N independent observations xi ,X 2 , • • • , xv . The lower bound 
for the variance of the unbiased estimates of $i , when we use 

2 in (2.3.1) is given by . 

ti-1 iV 

However, if ^ is used, the lower bound, by the help of 

(2.5.1), is found to be equal to 2d\/{N — 1). In fact there exists the statistic 


N - 1 



10 


G. R. SETH 


.V 

whose variance is equal to 2Q\/{N — 1) where x = X) Thus the use of 

t-i N 

2] • <t>ii,i 2 (N) brings into relief the unbiased estimate with the 

* 1 +» 2^2 

least variance. 


3. Multi-parameter case. In this section we will prove the result mentioned 
in (1.3.2) of §1.3. 

§3.1. Let 0 consist of T components (^i, ^2, • • • , Ot) and 61 ^ ^ he 

unbiased estimates 0 / , ^2, • • • , respectively. Also, let a sequential process 

of the type described in §2.1 be given. We postulate the following regularity con¬ 
ditions: 

(3.1.1) . The covariance matrix H Va || of the estimates di(i = 1 , 2, T) is 
non-singular in D, where D is an open interval of the T-dimensional parameter 
space. 

(3.1.2) . The conditions of section (2.2) are satisfied for each one of 

0?(i = 1, 2, • • • , T) and . »r(n), (ii + ^2 + • • * + ir < m). 

(3.1.3) . The covariance matrix of .»r(^)i ^1 + ^ 2 + * • • + ir < ^ exists 

and is non-singular. Under the assumptions (3.1.1)-(3.1.3), we prove the result 
(1.3.2) in section 1.3. 

Proof: Using the same arguments of §2.3, we obtain 


(3.1.4) 




(iff = d00 = 1,2, • • • ,T), 

li = i,2, ...,r 


(3.1.5) = 0 otherwise. 

Let the covariance matrix of 6 *{j = 1, 2, ••• , s; s < T) and . irin), 

(ti + fs + • • • + ir < m) be given by 


(3.1.6) 

where 


A 

B 

B' 

W 


(3.1.7) A = llFiyIl, = 1,2, •••,«; s<T-, 

(3.1.8) B = 117,011; 

(3.1.9) and W = covariance matrix of the set 


[^•l,i2,...,i7.(R)|fl + ^2 + • • • + fr < Wl], 


arranged such that the jth term in the leading diagonal is given by 

(3.1.10) .<j.(n)), where i,- = 1, f/s = 0, ^ 4= j, (j = 1,2, • • • , T), 

and B' is the transpose of B. 

As (7 is positive semi-definite, we have 

(3.1.11) 1 t7 1 > 0. 





VARIANCE OP ESTIMATES 


11 


The above can further be reduced to 

(3.1.12) \W\-\A - BW^B' I > 0, 
which leads to 

(3.1.13) \A — B ' W~^ ' \ > 0, as IF is positive definite. 

By the use of (3.1.8) we obtain from above 

(3.1.14) |A-C'I>0 

where C is the top left part of consisting of s rows and s columns. 
Let us now consider the matrix 


(3.1.15) WVii-viiW, (f, i = 1, 2, D, 

where |1 u,-, || is the topleft part of TF~‘ consisting of T rows and T columns, and 
is equal to 

(3.1.16) \\Wn- WnW7iWn\\~\ 
when W is written as 


(3.1.17) 


IlIFii 

Wn 

II 



3 

to 


where Wn has T rows and T columns. 

By the repeated application of (3.1.14), we are led to the conclusion that all 
the leading minors of the matrix in (3.1.15) are either positive or zero. Hence 
the matrix in (3.1.15) is semi-positive definite. 

If now we put 

(3.1.18 ) iumIi = lUo-ir 

we obtain 


(3.1.19) II M.7 - T*' 11 
to be semi-positive definite. Thus the ellipsoid 

(3.1.20) Z = T + 2 

».j-i 

contains within itself the ellipsoid 

(3.1.21) Z i^irti-ti = r -f- 2. 

Cramer calls the ellipsoid in (3.1.20) a ‘‘concentration’^ ellipsoid. 

We will now show that the ellipsoid given by (3.1.21) contains within itself 
the ellipsoid 

T 

Z lii'tcti =T + 2 

I ./-1 


(3.1.22) 



12 


G. B. SETH 


where || In || is the information matrix given by Wu in (3.1.17). We will prove 
the above by showing 

(3.1.23) (i, j = 1 , ... , T), 

to be semi-positive definite. 

We obtain, from (3.1.16) and (3.1.18), 

(3.1.24) II Mii II » Wu - WuW7>a , (i, j = 1, 2, • • • , T). 

From the above it follows that 

(3.1.25) II In - ixn II = WuWr/Wu . 

Thus the matrix on the right hand side is semi-positive definite since Wa is 
positive definite, we see that the ellipsoid (3.1.21) contains within itself the ellip¬ 
soid given by (3.1.22). This proves the assertion made in (1.3.2) of §1.3. It 
may be seen that (3.1.22) is strictly contained in (3.1.21) if and only if Wu ^ 0. 
It may be mentioned that in this section as well as elsewhere, T + 2, appearing 
on the right hand side of the equation of an ellipsoid, can be replaced by any 
positive constant. Also the ellipsoid in (3.1.21) depends upon the choice of m 
and it can be shown that for any two positive integers mi, wij (wii > tmj) the el¬ 
lipsoid for m = mi contains within itself the one for m = m 2 . 

§3.2. In general, let d*(xi, Xi, • • • , x„) be statistics whose expectations are 
yi(0i - ,BT),{i= 1, 2, • • • ,T), the latter being assumed to admit partial 

derivatives of all possible orders. Under the postulates enumerated in §3.1, 
we see that the ellipsoid in (3.1.20) contains within itself the ellipsoid 

(3.2.1) i: Sn-ti-li = r -h 2 

t.7-1 

where 

l|s,yi| = II ir, = 1 , 2 , ■■■ ,r, 

^* 1 + 12 +- 

39^ • • • dev » • • • > ®r) , 

(j " ly^j *** y ~i” ^2 “I” * * * 2 r ^ m) , 

where j and fi -|- *2 + • • • + fr indicate the number of the row and the column 
respectively and is arranged to correspond to the arrangement of W, where W 
is the same as given in (3.1.9). 

4. Achievement of the different lower bounds. In §4.1 we will demonstrate 
the desirability of finding a higher lower bound to the variance of sequential 
estimates than that given by Wolfowitz, by giving two examples m which the 
latter is not achieved. From §2.4 it is clear that this will be so if J&(^(n) • ^<(n)) 
is not zero for at least one value of i > 2. We will demonstrate that this is true 


(3.2.2) 
and 

(3.2.3) 



VARIANCE OF ESTIMATES 


13 


for i = 2. In §4.2 we show that if “efficient” statistic exists for all M > N, 
the bound is achieved only in the case when the sample size is fixed. In §4.3 
we obtain necessary and sufficient conditions for the attainment of the bound 
given in (1.1.4). In §4.4 we discuss the conditions under which there exists a 
“concentration ellipsoid” which coincides with the ellipsoid given in (3.1.21) for 
samples of fixed size N. 

§4.1. Ex. 1. The Wald sequential procedure for testing d = Bi, against d = 
dt in a normal population 

(4.1.1) 

is given as follows: If 

(4.1.2) B < E (xi - (s = 1, 2, • •. , j - 1), 

and 

(4.1.3) E (^i ~ eiiher >A or <B, 

we cease sampling and make a decision. Here A and B are constants fixed by the 
probability levels of making a correct decision. 

Let us denote the set of points satisfying (4.1.2) and (4.1.3) by . In this 
case 

n n 

(4.1.4) <^i(n) = E (a;. — S) = Z„ — n0, where Z„ = E 

»-i »-i 

The above is differentiable with respect to 6. On differentiating we have 

(4.1.5) ^(n) = (Z„ - nB)- - n. 

Now 

(4.1.6) • <h{n)) = E{Z„ — nBf — E{n{Z„ — nB)). 

By theorem 7.3, Wolfowitz [5], 

(4.1.7) E(Z, - nBy = En ■ E{X - B)" + 3E(n(Z„ - nB)), 

where X has the distribution given in (4.1.1). As E(X — B)^ is equal to zero, 
(4.1.6) reduces to 

(4.1.8) EMn) • «fe(n)) = 2B(n(Z„ - nB)). 

We will now show that right hand side of (4.1.8) is not identically zero in B. 
Let us consider 

(4.1.9) E{n) = Z • [exp ( - J i (*< - <?)*) j H • 



14 


G. E. SETH 


Differentiating with respect to 6, we get 

(4.1.10) i (EM) - f .[e,p (-it („ - »)•)] A*.. 

The righthand side of the above equation being equal to jE(n(Z„ — nO)), the lat¬ 
ter does not vanish identically in 6, because the lefthand side is not identically 
zero. The step from (4.1.9) to (4.1.10) can be easily seen to be valid. 

Ex. 2. The Wald sequential procedure for testing p = pi against p = p 2 in a 
binomial distribution, where p is the probability of the event occurring, is given 
as follows: If 

(4.1.11) B <t,ixi-d)<A, s = 1, 2, • • • , j - 1, 

i-l 

and 

i 

(4.1.12) (xi — d) is either >A or <B, 

i-l 

where d is given by [log (1 - pi)/(l - p 2 )]/log [(p 2 (l - Pi)/pi(l - Pz)], the 
process stops with the jth observation and a decision is taken. Here, *,• is the 
characteristic function of the event at the ith trial, that is: 

* 1 = 1 , when the event occurs at the fth trial; 

= 0, otherwise. 

Let us denote the set of points satisfying (4.1.11) and (4.1.12) by E#. In this 
case we find 

(4.1.13) E[<t>i{n)-<l>i{n)] = • E{n{Z„ - np)), 

n 

where Z„ = 2 Xj. We have now to show that the righthand side is not iden- 
1 

tically zero. Differentiating 

(4.1.14) E{n) = 22 - P)'"*’ 

i-l Rj 

with regard to p, we obtain 

(4.1.15) ^ (Ein)) = 22 • P*'-(l - P)'""' • 

The righthand side of the above is the same as 

(4.1.16) E(niZn - np)). 

Thus, the lefthand side of (4.1.15) being not identically zero, the same is true for 

(4.1.16) , and consequently the bound given by Wolfowitz is not achieved in this 
case. 



VARIANCE OP ESTIMATES 


15 


The step from (4.1.14) to (4.1.15) is valid as 
(4.U7) t (£,y,1^;^) 

is absolutely and uniformly convergent. 

§4.2. Let 6* be some unbiased estimate of 0, where a^t's are successive inde¬ 
pendent observ^ations on the chance variable A"" having the probability density 
function or probability function/fa:; 0), We adopt a sequential procedure men¬ 
tioned in §2.1 satisfying the regularity conditions in §2.2 and also postulate 
the following: 

(i) For all positive integral values ol M > N 

M 

Pu(xi ,X2 ,‘",Xm;6) = n 

i-l 

possesses an ‘efficient’ estimate for 0, where N is the least value of n for which 
Pr(n = N) 4= 0. 

(ii) E(n) exists and admits derivatives up to the second order with respect 

to 0. Furthermore, ^{E{n)) is either zero for all 0 under consideration or is 
uO 

never zero. 

Under the above conditions the Wolfowitz lower bound for the variance of unbiased 
estimates is achieved only when Pr(n = N) = 1. 

Proof: This bound will be attained if and only if there exists an unbiased 
estimate 0* of 0 such that 

(4.2.1) E{0* - 0 - = 0, 

that is, 

(4.2.2) 0* - 6 == 

with probability one, where K is independent of all Xi^ and n. As there exists 
an ‘efficient’ estimate, say >P{M) for all M >N^ we have 

(4.2.3) W/) - «- fTT^ - 

for all M > N, From (4.2.2) and (4.2.3), it follows that 

(4.2.4) 0* - 0 = K • n ■ iiin) - 0) • E log/(a:; 0)^ j . 

Now as 

_ 1 _ 

£n • i?[(|^log/(x ;<?)J 



(4.2.5) 


K = 



16 


G. R. SETH 


we have 
(4.2.6) 


e* -e 


n • (^(n) — $) 

Sn 


If E(n) is independent of d, then from (4.2.6), we obtain 


(4.2.7) 


n/E(n) = 1, 


that is, n is constant with probability one and the sequential procedure reduces 
to a fixed size sample case. If E(n) is not independent of 6, then differentiating 
(4.2.6) with r^ard to 0, we obtain 


(4.2.8) 


1 


n • (^(n) - 0) . t{En) 


+ 


n 

En‘ 


d 

As -T {En) is not equal to zero for any 0 under consideration, substituting the 
ad 

value of ^(n) from (4.2.8) in (4.2.6), the latter takes the form: 


(4.2.9) 


01 * -0 = 


En — n 

i (w ■ 


Differentiating the above with respect to 0, the result is: 
(4.2.10) 


En — n d* . 

— 1 = — r, ^En) + 1. 


.,d*, 


[I H’ 


Now if -^(En) = 0, then (4.2.10) is not valid, thereby contradicting (4.2.2). 

(f 

If ^ (En) =t= 0, then rearranging (4.2.10), we obtain 


(4.2.11) 


n 






+ En, 


that is,»is a constant with probability one. This proves that Wolfowitz bound 
is achieved only in the case when n — N with probability one. This generalizes 
the result of Blackwell and Girshick [6]* to the extent that in [6] the existence of 
an efficient estimate is assumed for all integral values of M instead oi M > N, 
as assumed here. Moreover the proof given here, with slight modifications, is 
also valid when the successive observations are not independent. 


’ In [6] the assumption that “ij + + ••• + zjf be a sufficient statistic for all M” 

really amounts to the postulate that "xi + z* + * ■* + be an “efficient” statistic for all 
M,” when we restrict ourselves to probability density functions satisfying the conditions 
given by Koopman in [7J. 



VARIANCE OF ESTIMATES 


17 


§4.3. Let us consider a sample of fixed size AT. Let 6* together with the 
probability density function ps satisfy the following regularity conditions: 

(i). There exists a transformation T from (xi, xi, ■ * * , Xy) to the variables 


“ f<(xi, xj, • • • , xm), 6* = e*(xi , xj, • • • , xh), 

(4.3.1) 

i = 1, 2, •. • , AT - 1, 

such that 

(a). The functions are everywhere unique and continuous, and have con¬ 
tinuous partial derivatives 


dXu’ dXu ^ 


1, 2, • • • , AT - 1, « 


1,2, ...,Ar) 


in all points (xi, xi, * ■ ■ , x/r) except possibly in certain points belonging 
to a finite number of h 3 q)er-surfaces. 

(b). The relation (4.3.1) define a one-to-one correspondence between the 
points X = (xi, Xi, • • • , Xw) and y = (?i ,&,••• , ^if-i, 6*) so that 
conversely x,- = niiii ,&,••*, h-i , ®*) where m are imique. 

(ii) . There exists partial derivatives of g(6 *;0), h(^ ,&,••*, iir-i 1 0*‘, 0) with 
regard to 0 of all orders up to and including m, where m is some finite integer. 
The variances of 0*, hi and gi • g ,-, i, j = 1,2, • • • , m, are finite, where hi and gi 
are defined in section 1. 

(iii) . There exist functions 



such that 

4 ^ < Tiiixi, Xi, • • • , xy); 


00^ 


< Wh 


a% 

60^ 


< Ta{^ , , • • • , 0*), 


for all 0 in D and for almost all (xi, xs, • • • , xw) where D is an open interval. 
Further 


j r,i(xi,X2, 


• , a:i»r) n dXu , 

14-1 


J dS* and J Tiz((i , {2 > * • • > ^Af-i; S*) II 




are all finite, the range of integration, in each case, being the whole range for 
the arguments indicated. Then the necessary and sufficient conditions that the 
variance of d* equals the lower bound given in (1.1.4) are 



18 


G. R. SETH 


(iv) . /ii, , • • • y hm are linearly dependent considered as functions of { 1 ,^ 2 , 

• • , for any given 6* and dy and 

(v) . The probability density function g of 6* is of the form 

m 

e* - e = '£K,gi 

t-l 


where Ki may depend upon 6 and N only. 

The proof here is given when Pn is a probability density function. It is also 
valid with slight modification when px is the probability of discrete variables. 

Proof: Let J be the Jacobian of the transformation T in (4.3.1). Then 
because of conditions (i) and (ii) above, we have, 

( 4 . 3 . 2 ) Pifixi ,X 2 , ,Xif-,0)-\J \ = g{e*; , fjv-i I 6*-,6) 


Further 

(4.3.3) 


f , • • • , 1^-1 1 ; e) n = 1, 

J M-1 


the range of integration being the space of , ^ 2 , • • • , {at-i . Differentiating 
the above i times under the integral sign, it follows that 

(4.3.4) E(hi 1 6*-, e) = 0. 

Similarly Ave have 


(4.3.5) E{gi • hi) = 0 

as the expectation of the quantity on the L.H.S. is finite by virtue of (ii). More 
generally, we have 

(4.3.6) E{F{e*) • hi) = E[F{B*) • E{K 1 »*)] = 0 

if E{F{$*) ■ hi) is finite. Let us now e-xamine 

(4.3.7) E(e* -e-Z KMN)J , 
where Ki(t>i{N) can also be written as 

(4.3.8) Ki ^i + higi-i -f- .. • -f . 

Now (4.3.7) can be put in the form 

( m m \ 2 

e* -e -T, Kigi -JlLihi) , 

»-l »-l / 

w^here 

4.3.10) “ S ’ 

clearly depend on 6 and 6* only. 


{i = 1, 2, • • • , m). 



VARIANCE OF ESTIMATES 


19 


By virtue of (4.3.4-4.3.6) and F(d*) involved in (4.3.9) being such that 
E[F(ff^) • A,](f = 1, 2, • • • , ot) is finite because of (ii), we can further reduce 
(4.3.9) to 

(4.3.11) E(^e* -0-Z Ki ((£ U | «*)] • 

The lower bound will be achieved if and only if the above expression is zero, 
the necessary and sufficient conditions for which are: 

(4.3.12) e* - 6 = T, Ki ■ Qi, 

i-1 

and 

m 

(4.3.13) 2 Lihi ^ 0 in , • •' > fjv-i 

1-1 


for any given values of 0* and d. 

(4.3.13) is equivalent to the condition that hi, (i — 1,2, ,m) are linearly 

dependent considered as functions of ^i, , • • • , {at-i for any given values of 6 

and 6*. 

When m takes the value one, the above reduces to the Cramer conditions for 
the existence of an “efficient” estimate. 

§4.4. MuUiparameter case. Let dt, 9*, • • • , 0* be the unbiased estimates 
of ^ 1 , ^ 2 , • ■ • , in the probability density function 


Pn{xi , X2, Xs ; 01, 0t, , 0 t) 

0* 

and the regularity conditions of §4.3 are satisfied when 0* and —. (f = 1, 2, • • • , 

06 * 

m) are replaced by 0* (j = 1,2, , T) and 


- VxT 

ae’i' b9\^ • • ^0V 


*1 + 4 4- • • • + ir < wi) 


respectively. Further let 

Pn{Xi , Xi , • • ■ , Xu -, 01, 02, ■ • ■ , 0t) • \ J \ 

( 4 . 4 . 1 ) = g{0i, 02, ••• , 0 t', 01, 02 , ' ‘ , 0t) 

• ^({l > ^2 7 • • • I fw-l I ^1 7 02 , • • • 7 ^ t ) 

where g and h are respectively the joint probability distribution functions of 
^17 ^2 7 • • • 7 and the conditional probability distribution of $i, { 2 , • • • , Jat-i 
for a given set of values of 0*, 0*, • • ■ , 0*. In order that the ellipsoid (3.1.20) 
coincides with the one given by (3.1.21), it is necessary and sufficient that the 
following be satisfied for each t(t = 1, 2, • • • , T) 

(4.4.2) E(0* - 0t - 2 . . • 4-.1.1.= 0- 

\ »l+»2+* • / 



20 


G. R. SETH 


Now reasoning similar to that in §4.3, we conclude from the above that the 
necessary and sufficient conditions are: 


There exist T independent linear combinations of 

(4.4.3) + ^*2 + • * • ir ^ M 

which vanish with probability one for any given values of the sets 
(Si, 62, • • * , St) and (Si, S2 ,••• , St), 


and 


(4.4.4) d* 




K. 






« = 1,2, • • •, r. 


where the K’s do not depend upon 0* and {/s. For T = 1, the above reduce 
to the conditions in §4.3. We will now give an example in which (4.4.3) and 

(4.4.4) are satisfied. Let 


1 r 1 ^ “1 

(4.4.5) Pk(xi ,X2, “• ,xs \ 0i,0i) = exp - • X (»< - fti)*! 


We have 

(4.4.6) 

(4.4.7) 


= {xi -xf/iN - 1), 
*-x 

N 

02 = 2 XU/N = X, 
i-1 


imbiased estimates of 0i and 02 in (4.4.5). The joint distribution of 0t and 
0* is given by 

?(0^02;0i,02) = C 


(4.4.8) 


• exp 


■-iV(02 - 02? (N - 1)0^ 

20 , 



It can be easily seen that the condition (4.4.3) is satisfied, and the estimates 
themselves can be put in the form 


(4.4.9) 


20? 1 dg _ 0? 

N~^l ' g d0i N{N - 1) ' g d0l’ 


(4.4.10) 

9|C 3|C 

It is thus seen that the ‘concentration’ ellipsoid for 0,, 02 coincides with the el¬ 
lipsoid (3.1.21) for TO = 2. On the other hand if we use to = 1 , the condition 
(4.4.3) is satisfied but not the one in (4.4.4), as can be seen from (4.4.9), and thus 
the concentration ellipsoid strictly contains within itself the one given by the 
information matrix. It may be noted that for to = 1, the condition (4.4.3) 



VARIANCE OF ESTIMATES 


21 


merely requires that a system of sufficient statistics exists for estimating 6 i , 
62 , •' ’ , 0T • . The reason is that the condition (4.4.3) takes the equivalent form 

(4.4.11) ^ = 0 

for t = 1, 2, • • • , r identically in , • • • , iw-r that is, that h is free of 

^1 » 0i f ' ' ' } 0T • 


6. Miscellaneous. In §5.1-§5.3 we discuss certain properties of </>.(«). In 
§5.4 we obtain conditions imder which there exists no unbiased estimate of 0, 
having a finite variance, which is functionally dependent upon a given unbiased 
estimate 0 * of 0 , 

§5.1. Assume that there exists an “efficient” statistic 0 *(xi, X 2 , • • • , xn) 
for estimating 0 , in probability density function (or probability) 

Pn(xi , X2 , ■' • , Xs ; 0). 

That is, 

(5.1.1) e*(xi,Xi,--- ,x») - 0 = K ■ <l,i(N) 


where K as usual may only depend on 0. We postulate as usual the existence 
of all partial derivatives of py of all orders and also of iiC up to the third order 
with 


(5.1.2) 


(fK 

dd* 


« 0 . 


Further we assume that 


where 



< Ti{xi ,X2, ,Xk) 


/ Ti(xi , xj, • • • , xy) n 
J tt-1 


is finite for all i 


Under the above assumptions we will show that 

UN) = 1, UN), UN), • • • , UN), ■ ■ • 
form a set of orthogonal polynomials in UN) with respect to the weight function 

py(xi, Xj, • • • , xy ; tf). 

Proof: We can easily sec that 

(5.1.3) = <^<+1 — 


where UN) is shortened to for convenience. Differentiating (5.1.1) with 
respect to 0 , 

/ft a\ 0^1 


1 dK 


1 



22 


G. R. SETH 


Let US designate 
(5.1.5) 


_ 1 d'K 
^ K dd' 


for all integral values of f. From (5.1.3) and ( 5 . 1 . 4 ), it follows that 


(5.1.6) 


02 01 = ””2?101 — 


Differentiating (5.1.6) further with regard to 6 and using (5.1.3) and ( 5 . 1 . 6 ) we 
obtain 


(5.1.7) 


Differentiating (5.1.7) with regard to d, and using (5.1.2) we get 


(5.1.8) 04 — 0 
We assume generally that 

(5.1.9) 0»+i — 010,* 


03 — 0102 = — 2^102 — + 22 ^ 01 • 

: (5.1.7) with regard to and using (5.1.2) v 

04 — 0103 = — 3^103 — ^ 3^2 + 02 - 

nerally that 

0»+i “ 010* = 


04 — 0103 = — 3^103 




Differentiating (5.1.9), and employing (5.1.3), (5.1.3) and (5.1.9) we obtain 
(5.1.10) 0,+2 — 0i0t+i = “(i + l)zi<l>i^i — Z2 4* " ^ 

We know that (5.1.9) holds for i == 1 , 2 , 3; 0 o being taken equal to one, and 
we have proved that if (5.1.9) is true for i = it is true for i j + 1 . Thus 
by mathematical induction (5.1.9) holds good for all integral values of t. 

, It is also clear from (5.1.6) and (5.1.9) that 0 , can be expressed as a poly¬ 
nomial in 01 of the ith degree, the coefficient of 0 i being equal to unity. 

To complete the proof of our assertion we will now prove that 


(5.1.11) 
From (5.1.9) 

( 5 . 1 . 12 ) 


• 0 /) = 0 , i 4= j- 


01 •0* = 0»+l + 


.. + ( 


- 1) . _i_ * 


^ ^t-i, 


where i is any positive integer. We multiply both sides of (5.1.12) by and 
reduce every product to a linear combination of (^,-n , <t>j and ^,_i with the 
help of (5.1.12).^Repeating this process^ — 1 times (j < i) it follows that: 


(5.1.13) 


01*0. 0*+j ”f" du’4^i+f~u “1“ d2j*(pi—j 


where are functions of K, z\ and zj. From (5.1.13), by taking expectations 
of both sides, 

(5.1.14) S( 0 j • 0 .) = 0, (j < i). 



VAKIANCE OF KSTIMATES 


23 


Now, since <j>i is a polynomial of the jth degree in we conclude that (5.1.11) 
is true for all integral (positive) values of i. 

Thus we obtain 


(5.1.15) MN) = 1, MN), <A.(N), 

as a set of orthogonal polynomials in the weight function being 

Pn(xi ,X2, ■■■ ,Xk;0). 

Furthermore 


(5.1.16) 
where 

(5.1.17) 
and 

(5.1.18) 
Hence 
(5.1.19) 


«-l 

& = n B, 


/-2 


R _ j(j - 1) , , i 

B, -+ 


E{<t>i *(t>i) — n B? • 

j-i 


Thus if we divide 0, by l/n Bj , (5.1.15) becomes the orthonormal set. 

Some cases, where we obtain 0,- as orthogonal polynomials, are given beloAv, 


1 -“i s (»»-«)* ^ / N 

1- pk = ® = 23 fe - ^)- 


1 --i T 

^ i2rey''‘' ' ^ 

X N 

.s s *< 

3. = •(!-«) 


01 = 


202 20 


( a;* = 1 with prob. 0 \ 

= 0 with prob. 1 — 0/ ’ 


01 = 


tixi- Nd 

eil - e) ' 


N 

4. Pw = —^ 


01 = 


S X.- - N0 

t-1 


Ha:.! 



24 


G. B. SETH 


Ai and Bi, the coefficients of <l>i and respectively in (5.1.12) for the above 
four cases are given as below: 


A, 

Bi 

0 

i-N 

2il$ 

i{i — 1) ,iN 

id - 2 «) 

—i(i — 1 ) , 

0(1 - 0) 

0(1 - 0 ) 0(1 - 0 ) 

tie 

m/e 


3. 

4 . 

It may be mentioned that in all these cases are also a complete set of 
pol}rnomials. 

m 

§5.2. Let 2 where Ki{i = 1,2, • • •, m) depends upon 0 be such that 

<-i 

m 

Ki^x{n) and 0,(n) satisfy the regularity conditions mentioned in §2.2. Then 

tn 

we will show that ^ K^i{n) cannot be a function of xi, X 2 , • • • ,Xn alone except 
•-1 

for constant zero. 

m 

Pboop: Let us assume that 2 ' <Pi(n) is independent of that is, it is 

some statistic, say, 

(5.2.1) 9*(xi,xt, ••• ,Xn) = Kr^iin). 

•-1 

Taking expectations of both the sides, we obtain: 

(5.2.2) Emxi , X2, • • •, a:,)) = E KrEM = 0. 

•-1 


Differentiating (5.2.2) i times with regard to 6, we have, because of the regu¬ 
larity conditions on 0<(n) and fl*(a:i ,Xi, • • • , a:„), 

(5.2.3) E{e*(xi, Xi, , Xn) • <^.(n)] = 0, t = 1, 2, • • • , to. 

It may be noted this is similar to the result in (2.3). From (5.2.3) and (5.2.1) 
it follows that 

(5.2.4) El9*(xi ,X2,---, Xn)f = 0. 

Thus 6*(xi, xt, • • • , Xn) is zero with probability one, that is, 

m 


if independent of By is zero with probability one. This proves our assertion 
that this cannot be a function of xi, X 2 , • • • , Xn alone except for constant zero. 
From the foregoing we deduce the following conclusions: 

I. 0»(n) or any power of it cannot be a function of the observations free of 0. 



VARIANCE OF ESTIMATES 


25 


II. If a statistic 0*(xi , , • • • , x,), which is not a constant with probability 

one, can be put in the form 

(5.2.5) fl*(xi, xj, • • •, Xn) = /iCo + £ Ki-<t>i(n), 

i-l 

where m is some finite positive integer, then 

(i) Ka must depend upon d, 

(ii) The expression (5.2.5) for 0*(xi , xj, • • • , x„) in 0<(n) is unique. 

(iii) No other unbiased estimate of Ko satisfying the regularity conditions 
can be put in the form (5.2.5). 

(iv) When m = 1, there is no other statistic except a6* + b, where a and b 
are constants independent of d, which can be put in the above form 
Ko + Ki • 0i(n), Ko and Ki are differentiable functions of 6 and Ki does 
not vanish for any 0 under consideration. 

(v) Let { be any function of Xi, X 2 , • • • , x„ free of 0, satisfying the regularity 
conditions of §2.2 with E(^) = 0. Since the covariance between { and 
0*(xi , xj, • • • , x„) in (5.2.5) is equal to zero, the statistic of the form 
(5.2.5) has the least variance of all unbiased estimates of Kq that satisfy 
the regularity conditions of §2.2. 

Also, if the probability density or the probability function depends on more 
than one parameter, then all the above results except (iv) hold good if 

llK,-4>v{n) 

is replaced by 

.ir ■*^ii.*i.-".«r(w)‘ 

»l+»2+* • • + » 

§5.3. Let us now prove the assertion made in (iv) of §5.2, when m is equal to 
one. 

Suppose the contrary that there is a statistic 6i*(xi • xj, • • • , x„) which is of 
the form 

(5.3.1) ^^(xi, xj, • • • , X,) = Lo + Lx • ^i(n). 

0*{xi , Xj, • • • , X,), of course, has the form 

(5.3.2) 0*(xi, xj, • • • , X,) = Xo + -Ki • 


We will assume Xo, Xi, Lo, Li to be differentiable functions of 0 and that 
Xi, Li do not vanish for values of 0 under consideration. 

Differentiating, with respect to 0, the expressions in (5.3.1) and (5.3.2 , we 


<l>i + Li(0j — ^i) = 0; 
+ Kii<f>t — ^i) = 0, 


have 

(5.3.3) 

dLo , dLi 

(5.3.4) 

dKo , dKi 
d0 de 



26 


G. B. SETH 


where is short for ^,(n). Taking the expectations of the above and rearrang¬ 
ing, it follows that 

(6.3.5) 

From (5.3.3) to (5.3.5), we deduce that 


1 = _L 

Z/i do K.\ do 


Now solving the above differential equation, we get 

(5.3.7) Li = aKi, 

where a is a constant independent of 0. From (5.3.5) and (5.3.7) it follows that 

(5.3.8) Tjo = ciKb -f- b, 

where 6 is a constant independent of d. From (5.3.7) and (5.3.8) we conclude 
that the statistic in (5.3.1) must be of the form a + 6, which proves our asser¬ 
tion. An immediate consequence is that if there exists an efficient statistic for 
estimating yid), then no other function of 0 except a y{6) -f b can have an effi¬ 
cient estimate.’ 

§5.4. If fl*(xi, a: 2 , ■ • • , x^) is an unbiased estimate of B satisfying the follow¬ 
ing conditions: 

(i) Among all unbiased estimates of B having finite variances, which are also 
functions of B*, B* is one with the least variance, 

(ii) For all B there exists a complete set of polynomials with respect to the dis¬ 
tribution function of B*, then there exists no unbiased estimate of B with 
a variance, which is functionally dependent upon B*, e.xcept B* itself. 

Proof: Let B* be the unbiased estimate of B which has the least variance 
among all unbiased estimates of B which are functions of B*. Further let (S(tf*) 
be any function of B*, free of B, whose expectation exists and is equal to zero. 
Let the variance of S{B*) be finite. It is well known that for any such S{B*) 


(5.4.1) E{B*S(B*)) = 0. 

Now 6*SiB*) in turn having expectation equal to zero, we obtain 

(5.4.2) E{B*^S(B*)) = 0. 


Repeating the above i times we obtain, in general, that 

(5.4.3) EiB*'SiB*)) = 0 


d 2 

* We assume the existence of — (i = 1, 2) and - for all 0, and also postulate that 

do* dd 

and E(4>\) do not vanish for any 6 under consideration. 



VARIANCE OF ESTIMATES 


27 


for all positive integers i. From the above, with the help of condition (ii), 
we conclude that S{B*) must be equal to zero. Thus if H{0*) is an unbiased 
estimate of 6 with finite variance, then from above, — 0 *, having the ex¬ 

pectation zero and a finite variance, must be zero with probability one. Thus 
H(6*) is the same as 0 *, which proves the result. 

Example. If is of the form ( 5 . 2 . 7 ) and condition (ii) is satisfied, then 
there is no function of 6 *, free of 0 and having a finite variance, whose expec¬ 
tation is Ko . 

Conditions (i) and (ii) above are satisfied for estimating 0 in the examples 
quoted at the end of the section 5 . 1 , and thus in these cases the result holds 
good when is the efficient estimate. 

I am highly tliankful to Professor J. Wolfowitz for his guidance and help in 
this research. 


REFERExXCKS 

[1] H. Cramer, Mathematical Methods of Statistics, Princeton Univ. Press, 1946, p. 480. 

[2] C. R. Rao, “Information and the accuracy attainable in the estimation of statistical 

parameters,” Calcutta Math. Soc. Bull., September, 1945. 

[3] A. Bhattacuaryya, “On some analogues of the amount of information and their use 

in statistical estimation,” Sankhya, Vol. 8 (1946); also “On some analogues of the 
amount of information and their use in statistical estimation,” Sankhya, Vol. 8 
(1947). 

[4] H. Cramer, ^^Contributions to the theory of statistical estimation,” Skandinavisk 

Aktuar. tids., Vol. 29 (1946), pp. 5-94. 

[5] J. WoLFOWiTz, “Efliciency of sequential estimates,” Annals of Math. Slat.,Wo\. 18 (1947). 

[6] Blackwell and Girshick, “A lower bound for the variance of some unbiased sequen¬ 

tial estimates,” Annals of Math. Slat., Vol. 18 (1947). 

[7] B. O. Koofman, “On distributions admitting a sufficient statistic,” Am. Math. Soc. 

Trans., Vol, 39 (1936), p. 399. 



ON THE THEORY OF SOME NON-PARAMETRIC HYPOTHESES 

By E. L. Lehmann and C. Stein 
University of Califomia^ Berkeley 

Stimmary. For two types of non-parametric hypotheses optimum tests 
are derived against certain classes of alternatives. The two kinds of hypotheses 
are related and may be illustrated by the following example: (1) The joint 
distribution of the variables Xi, • • • , Xm , Fi Fn is invariant under all 

permutations of the variables; (2) the variables are independently and identically 
distributed. It is shown that the theory of optimum tests for hypotheses of the 
first kind is the same as that of optimum similar tests for hypotheses of the 
second kind. Most powerful tests are obtained against arbitrary simple alterna¬ 
tives, and in a number of important cases most stringent tests are derived 
against certain composite alternatives. For the example (1), if the distributions 
are restricted to probability densities, Pitman's test based on y is most 
powerful against the alternatives that the X's and F's are independently normally 
distributed with common variance, and that E(Xi) = £, E(Yi) = rj where 
rj > If 7? — $ may be positive or negative the test based on | y — x | is most 
stringent. The definitions are sufficiently general that the theory applies to 
both continuous and discrete problems, and that tied observations present no 
difficulties. It is shown that continuous and discrete problems may be com¬ 
bined. Pitman's test for example, when applied to certain discrete problems, 
coincides with Fisher's exact test, and when m = n the test based on | y — x | is 
most stringent for hypothesis (1) against a broad class of alternatives which 
includes both discrete and absolutely continuous distributions. 

1. Generalities. In the present paper we study the problem of determining 
optimum tests for certain non-parametric hypotheses. It is important in this 
connection to make some distinctions which are of lesser significance when the 
problem is approached from the intuitive point of view which has been customary 
in this field. Consider for example the hypothesis H that Zi, • • • , Z^r are 
independently and identically distributed according to an unknown probability 
density function. All tests which have been suggested for testing H are valid 
also for testing the hypothesis H' that the unknown joint probability density 
function of the Z's is symmetric in its N arguments. On the other hand, tests 
which have optimum properties for testing H' against a certain class of alterna¬ 
tives will in general not possess the same properties when H' is replaced by H. 
From the present point of view the two hypotheses mentioned are essentially 
different. We shall be concerned in this paper primarily with generalizations 
of H', and we shall show that many of the tests suggested in the literature have 
optimum properties for testing hypotheses of this kind against certain classes of 
alternatives. 

The corresponding general theory for hypotheses related to H is quite different. 

28 



NON-PARAMETRIC HYPOTHESES 


29 


However the two theories do coincide, provided tests of these latter hypotheses 
are restricted to similar regions. More specifically, all results on optimum 
tests of are equivalent to the corresponding results on optimum similar tests 
of ff, and this equivalence holds also for many of the more general hypotheses 
considered in this paper. 

It should be observed that in many experimental situations, the hypothesis 
H' that the joint distribution of the Z’s is invariant under all permutations is 
more realistic than the hypothesis H that the Z’s are independently and identi¬ 
cally distributed. For example, suppose there is a block of land divided into 
m + w plots, and the experimenter wants to test whether one of two fertilizers 
(used in fixed amounts) is more effective than the other in increasing the yield 
of a certain plant. Of the plots, m are chosen at random; fertilizer I is applied 
to these, and fertilizer II to the other n. If Xi denotes the yield from the ith 
plot to which fertilizer I has been applied and F, denotes the yield from the jth 
plot to which fertilizer II has been applied, where the plots are numbered at 
random, then the hypothesis that the two fertilizers are completely equivalent 
implies that the application of any permutation to Xi, * * • , , Fi, • • • Fn 

does not change their joint distribution. But it is not reasonable to suppose the 
Xi , F, are independently and identically distributed, since there may be intrinsic 
differences among the plots. For discussions of these and related points, see 
Fisher [1], Nejonan [2], Pitman [3]. It may be that in many particular cases 
some hypothesis between the two is really appropriate but the hypothesis H is the 
only one that is evidently appropriate from a cursory inspection of the setup. 

Many of the alternative hypotheses considered below, for example those 
involving normality, are dictated more by tradition and ease of treatment than 
by appropriateness in actual experiments. Thus this paper should not be 
considered as providing absolute justification for tests such as Pitman’s but 
rather as suggesting a method of obtaining optimum non-parametric tests w^hen 
the class of alternatives is fairly well specified. 

Another possibility, first raised by Neyman [2], which has been ignored in this 
paper is the equality on the average of the two fertilizers but with fertilizer I 
having a larger dispersion than fertilizer II, or a distribution differing in some 
other characteristic. It would be reasonable to consider this as part of the 
hypothesis tested, but tests based on randomization may give a probability of 
rejection of the hypothesis of equivalence in this case which is much higher than 
the stated level of significance. We hope to return to problems of this type in 
later papers. 

Let us make the following basic assumptions. S is a space of points z and Cf 
is an additive class of subsets A of 2. Any member of Q will be said to be 
measurable. By a probability distribution we mean a measure f, defined over 
fl for which F(2) = 1. We shall be concerned with two classes of probability 
distributions: One, the class of all distributions, and two, the class of distribu¬ 
tions which are absolutely continuous Avith respect to a given measure /x, that is, 
the class of distributions F for which there exists a function / such that 



30 


E. L. LEHMANN AND C. STEIN 


(1.1) F(A) = f f(z) dniz). 

Ja 

We shall call / a generalized probability density function with respect to By 
Z we denote a random variable such that for any .4 in (J, 

(1.2) PlZeA] = F(A). 

For most of the applications we shall take S to be a Euclidean space, and (J 
to be the class of all Borel sets. Then if /i is Lebesgue measure, (1.1) states that 
/ is a probability density function in the usual sense. However, we shall have 
occasion to consider also some measures other than Lebesgue measure. By a 
hypothesis H we mean a class of probability distributions. Next we describe 
the hypotheses with which we shall be concerned. Let 11 be a partition of 2, 
that is, let 11 be a class of mutually exclusive subsets S of 2 such that every 
point 2 of 2 lies in one of the sets S, If two points Zi and Z 2 lie in the same set S, 
we shall say that zi is equivalent to Z 2 with respect to U: zi Z 2 (mod H). The 
set of all points which are equivalent to z will be denoted by T{z)y the number of 
points of T{z) by n{z). Concerning 11 we make the following assumptions: 

(i) All sets in n are finite, so that n{z) is finite for all 2 . 

(ii) If we define Sn as the union of all those sets S of II which contain exactly n 
points, there exist mutually exclusive sets Sn\ • • • , which are measurable 
and such that every element S of n containing exactly n points has one and 
only one point in common with each Sn ^ 

We shall say that a measure m is invariant under n if the following condition 
holds: For all n and i, j < n, if S is any set contained in Sn ^ and if S' denotes 
the set of equivalent points in Sn \ then n{S) = m(*S'). 

Given a partition n satisfying (i) and (ii), we formulate the hypothesis H 
that the distribution F of Z is invariant under n. We shall refer to H as the 
hypothesis of invariance under II. We shall also consider the hypothesis of 
invariance under a partition for a class of generalized densities /. In this case 
we assume that the measure n of (1.1) is given, and that II, in addition to (i) 
and (ii) satisfies the condition: 

(iii) The measure n is invariant under n. The hypothesis H in this case 
states that Zi ^ Z 2 (mod n) implies/(^i) = f{z 2 ). 

By a test of a hypothesis H we mean (see [4]) a measurable function v? on 2 
to the interval [0, 1] which with every point z e % associates a probability (p{z) 
of rejection. This definition, slightly more general than the usual one, is 
particularly useful in non-parametric work. Among other advantages it 
automatically takes care of the problem of tied observations. It also disposes 
of the difficulties encountered by Scheff4 [5] in his treatment of the problem of 
similar regions, as will be shown in Lemma 1. 

The size of a test (p is defined to be 

€(^) = sup I (p{z) dF{z), 

FtH J 


(1.3) 



NON-PARAMETRIC HYPOTHESES 


31 


If in particular 


j (pdF = €{ip) 


for all F in Hy(p is said to be similar for testing H, Extending the terminology of 
SchefT4, we say that (p has structure aS(c) if for all z in Sn 

(1.5) 2 

z'tTiz) 

The following lemma extends a result of Schcff6. 

Lemma 1. For testing a hypothesis of invariance^ any test of structure >S(c) 
is similar and of size e. 

Proof. For any F in H and any ip 

(1.6) f ^dF=tt <pdF = i, f [ E Az')] dF(z). 

J n-1 i-l JSn ' n-1 z'tTiz) 

But tp has structure S{e) and hence (1.5) holds for all 2 . Therefore 


f (p dF = ^ ne f dF = e, 

J n-1 


We shall show next that for testing a hypothesis of invariance at level of 
significance c, only tests of structure Sie) need be considered. In order to make 
this result applicable both to hypotheses referring to the class of all distributions 
and to those referring to a class of generalized densities, we shall state it in an 
asymmetric form which when taken together with lemma 1 indicates the essential 
equivalence of the two types of hypotheses. 

Lemma 2. If (p is any test of a hypothesis of invariance for the class of generalized 
densities with respect to a fixed measure /i, and if the size of ip is less than or equal to c, 
then there exists a test ipi of structure S{e) such that 


J ipi dF ^ J (p dF 


for all probability distributions F, 

Proof. First we shall show that 

(1.9) ~ E <p(z') < * 

n{z) z'tT{z) 

almost everywhere y. For let A be the set of points z such that 

(1.10) ~ E Az') > e 

n{z) z'€T{z) 


and suppose that positive. Let 


fiz) = 


if z e A; 


0 elsewhere. 



32 


E. L. LEHMANN AND C. STEIN 


Then / is in since by definition of A, whenever z is in A, T{z) is contained 
in A. But 

(1.12) J <pfd/ji > € j fdfjL = e, 

in contradiction to the assumption that (p has size €. 

From (1.9) it follows easily that there exists a test <pi of structure Sic) and 
such that for all z 


(1.13) <piiz) > ^{z). 

Since condition (1.8) is then satisfied, this completes the proof. 

Lemma 2 raises the question whether it is possible to reduce the problem of 
testing a hypothesis of invariance still further, or whether the tests of structure 
S(€) form, what Wald [6] has called an essentially complete class of admissible 
tests. This question is answered by 

Theorem 1. Let he a measure defined over (J. Lei Ho and Hi he two 'partitions 
of % satisfying conditions (i), (ii) and (iii), and such that z z' (mod Hi) implies 
z z* (mod IIo). For the class of generalized densities with respect to y denote hy 
Hi {i = 0, 1) the hypothesis of invariance relative to n,*. Then for testing 
against Hi at level of significance c, the totality of tests which (a) hcwe structure 
S{i)y and for which (b) z z* (mod Hi) implies piz) = ip{z^)y form an essentially 
complete class of admissible tests. 

Proof. It is easily seen that we can restrict ourselves to that subclass of 
tests of structure ^(c) which possess property (b). For if v? is any test of struc¬ 
ture ^(e) relative to IIo, let 


(1.14) 


n\z) 


Then clearly tp* possesses property (b) and has structure AS(e). Furthermore if / 
is any probability density function of Hi , then 


(1.15) 


J <P*f dy - J (pf dyy 


so that tp and have identical power against Hi . 

In order to complete the proof, we must show that if ipi and ^ are any two tests 
satisfying (a) and (b), and if ^ and ^ differ on a set of positive measure, there 
exists a probability density function / of Hi for which 

(1.16) J (Pif dy > j (p 2 f dy. 

Since both ipi and (p 2 have structure S(€), the set A of points z for which 

(1.17) <pi{z) > <p 2 (z) 

has positive measure. Also, because of (b), if two points are equivalent relative 
to Hi, they are either both in A or both not in A. If f(z) is defined as 1 /m(A) 
for 2 in A and as zero elsewhere, then / is in Hi and satisfies (1.16). 



NON-PARAMETRIC HYPOTHESES 


33 


The theorem obtained from theorem 1 by letting the hypotheses Ho and Hi 
refer to the class of all probability distributions rather than to a particular class 
of generalized densities, is clearly also true, and cases between these two theorems 
could also be formulated. 

Since the most powerful test (p for testing a hypothesis of invariance Ho 
referring to a class of generalized densities against an alternative / from this 
class of densities has the correct size also for testing the wider hypothesis Ho 
referring to the class of all distributions, <p is also most powerful for testing Ho 
against /. The corresponding remark holds for most stringent tests. Therefore 
all optimum tests that will be derived in the sequel, through the use of theorems 
of this section, may be considered as tests of hypotheses referring to the class 
of all distributions; they are valid against these hypotheses, and no power is 
gained by restricting the hypothesis to the appropriate class of generalized 
densities. 


2. Most powerful tests and most stringent tests. One of the main problems 
to be considered in this paper is the determination of a most powerful test of a 
hypothesis of invariance against a simple alternative. K we restrict our con¬ 
siderations to the class of generalized densities with respect to a complete 
solution of this problem is given by the following 
Theorem 2. Let H be the hypothesis of invariance under the partition IT, and 
let ghea probability density function not in H. For any z in Sn denote by • • • , 
the n points of T(z) arranged so that g{z^^^) > g{z^^^) > • • • > For 

testing H against g a most powerful test of size c is given by 




if 

g{z) > ff( 2 «-^“"”)l 

1 


(2.1) 

li 

if 


^ for z in Sn 


lo 

if 

g{z) < g( 2 »+'‘"»). 

1 


where 0 < a < 1 and where a may depend on z through T(z). 

Proof. First we observe that the number of for which g { z ^^) > 
is greater than or equal to l+[m] > m and that the number of z^^ for which 
g{z'''^) > is less than or equal to [«n] < «n, so that there exists an a 

between 0 and 1 for which 2 ^( 2 '’^) = n«. Since ip has structure S{t), it follows 
from lemma 1 that it is similar and of size c. 

Let 

(2.2) g*{z) = y( 2 “+“"”) for z 6 . 

To complete the proof consider first the special case that 

(2.3) J g*(z) d/ji(z) 
vanishes. Then 




34 


E. L. LEHMANN AND C. STEIN 


that is, the test ip has power 1, and therefore is clearly most powerful. Assume 
next that the integral ( 2 . 3 ) is positive. Then <7* is proportional to a probability 
density function of H. For it is measurable and satisfies the symmetry condition 
required of a member of //, and the integral ( 2 . 3 ) is finite since 


r 1 r ^ * ^1+ 1 

dn(z) <1^— 2 dniz) 

(2.5) " .-1 

< - 2^ f - 2 dfi(z) 

€ n JSn **"1 


1 

C 


The test p therefore has the form of a probability ratio test. Since it is also 
similar, it follows from theorem 1 of [ 4 ] that (p is most powerful. 

In practice one is usually interested in composite rather than simple alterna¬ 
tives. We shall therefore consider next the problem of deriving most stringent 
tests of hypotheses of invariance against certain classes of alternatives. This 
problem may be reduced to that of finding tests which maximize the minimum 
power over a class of alternatives by the following simple theorem of Hunt and 
Stein [ 7 ]. 

Theorem 3. Given a hypothesis H and a class of alternatives [ge], d denote 
by the envelope power function corresponding to the level of significance e, 
that isj let 


( 2 . 6 ) 


= sup 6 ) 


where i 3 (^, 6 ) stands for the power of the test (p against the alternative ge and where 
the least upper hound is taken over all tests (p of size e. Let {12a} be a class of mutu¬ 
ally exclusive subsets of 12 such that Ul2a = 12 and such that 0*{6) is constant on 
^each 12a . Denote by <pa a test which maximizes the minimum power over 12a . If 
<Pi ip is independent of 5 , then ip is rnost stringent^ for testing H against 12 at level of 
significance e. 

For obtaining tests which maximize the minimum power over a class of 
alternatives to a hypothesis of invariance, we can state the following simple 
extension of theorem 2. 

Theorem 4. Let H be a hypothesis of invariance, and let Hi be the class of 
alternatives [ge], OeQ. Suppose there exists a subset 12 ' of 12 and a probability 
measure X over 12 ' such that for the test ip of size c defined as in theorem 2 with 


( 2 . 7 ) g{z) = f ge(z) d\(d), 

Jq' 

the integral J ipg^ dp is constant for 6 in 12', and 

(2.8) J ipge dp > / ipge^ dp for all 0 € 12, 0' € 12'. 


Then ip maximizes the minimum power over 12 at level of significance c. 


^ A test is said to be most stringent [16] if it minimizes the maximum difference between 
envelope power and power, that is, if it minimizes Sup [/3*(0) — 0)]. 



NON-PARAMETRIC HYPOTHESES 


35 


Proof. By theorem 2, ^ is a most powerful test for testing H against g, 
that is, for any <p' of size « 

(2.9) j <p'(z) ge(z) d\{e) dn(z) < J <fi(z) g,{z) d\{e) dti{z). 
Consequently 

inf f <p'{z)g(,{z) dn{z) < f dX{0) f <p'(z)g,(z) dn{z) 

J Jq' j 

(2.10) = f ^'( 2 ) d,ji(z) f gt(z) dk(e) g f ^(z) d/xiz) f g^iz) dXid) 

J J Q' J •'12' 

= / dK{d) / <fi(z)gi(z) dniz) = inf / (piz)ge{z) dn(z). 

JQ' j Oeii J 

3. Normal alternatives. Let H be the hypothesis of invariance under 11, let 
T{z) be the set of points equivalent to z (mod n), and let/and g be two functions 
defined over Z. We shall write f g ii there exists a function F such that 

(3.1) f(z) = F[g(z\ T{z)l 

where for any fixed T(z), F is a strictly increasing function of g. We note that 
/ '^ gf in the following two special cases: 

(i) f(z) = F[g(z)] where F is strictly increasing; 

(ii) f(z) = a{z)g{z) + Hz) where a{z) > 0 for all 2 , and where Zi ^2 (mod 11) 
implies a{zi) = a{z 2 ), b{zi) = b{z 2 ). 

The usefulness of this notation stems from the following remark. Let g* and 
(p be defined as in (2.2) and (2.1) respectively and let f ^ g. If the test ^ is 
obtained from by substituting / and f* for g and g* respectively, then ^ 

The purpose of the present section is to obtain most powerful and most 
stringent tests of some hypotheses of invariance against certain classes of normal 
alternatives. In particular, problems will be exhibited for which various 
non-parametric tests suggested in the literature possess these optimum properties. 

Problem 1. Suppose that the random variables Zij (j = 1, • • * , s,- ; 
t = 1, • • • , m) have a joint probability density function, and denote by II the 
hypothesis that this probability density is invariant under all permutations 
of the Si arguments within the fth group for i = 1, • • • , m. Consider the 
alternative Hi that all variables are independently distributed with common 
variance </, and that 

(3.2) E(Zij) = axii + H , 

where a, the 6’s and the x^s are assumed kno^vn and where, without essential 
loss of generality, we assume a > 0. Assume further that 

= 0. 
j-i 


(3.3) 



36 


E. L. LEHMANN AND C. STEIN 


In order to obtain the most powerful test of H against Hi, we apply theorem 2 
with 


(3.4) 


g(2) = c exp 




~ SSiaXi,- + bi)Zi,- ~ XXXijZij. 


The most powerful test is therefore given by (2.1), if we replace g(z) by S 2 a:,. 
This test being independent of c^, the 6 ’s and o > 0, it is uniformly most powerful 
against the class of alternatives obtained from Hi by not specifying the values of 
these parameters but restricting a to be positive. 

If we drop the restriction a > 0, a uniformly most powerful test no longer 
exists; we shall instead obtain the most stringent test against this extended class 
of alternatives, using theorems 3 and 4. Clearly the envelope power function is 
constant on the surfaces | o |/<r* = constant. Take as the 0 of theorem 4, the 
set consisting of the two points (a, bi, • ■ ■ bm, <t) and (—a, bi, • • • ,bm,<r). 
Let X assign the probability J to each of the two points. Then the function g 
of (2.7) becomes 


1/ 1 

2[^.) 




7^2(zii - aXii - 6.)7 + 


1 / i_ 

2 \\/2T<r) 


(3.5) 


So - 5.) j 

exp{SS 2 . 7 (oa;<,- + 6 ,)) + exp{S 22 , 7 (-oa :,7 + bi)] 


exp { 22 ( 2,7 + axi 


~ exp{ 22 cw, 72 , 7 } + exp{ — 22 oa:, 72 , 7 ) ~ | 22 a:<, 2 , 7 | 


The power of the test <p obtained by substituting this expression for g in ( 2 . 1 ) 
•is the same at both points of Q. For this test is most powerful for testing H 
against the simple alternatives H' that the density of the Z’s is given by the first 
member of (3.5). But under the transformation Zjj = — Z,,- + 26,-, H and H' 
and therefore the test <i> are left invariant, while the two points of Q are permuted. 

Condition (2.8) of theorem 4 is therefore satisfied, and hence (p maximizes the 
minimum power over Q. Since furthermore is independent of the particular 
set il chosen, it follows from theorem 3 that (p is most stringent for the problem 
under consideration. In case condition (3.3) is not satisfied, let x,-, = x<, — x,-.. 
Then 22x,-,- = 0 and E(Zij) = ax'i, + b'i. 

Therefore the test criterion (3.5) becomes 


(3.6) |222.7( x ,7 - x<.)| = 122 ( 2,7 - 2.-.)( x ,7 - x,-.)|. 


Some special cases of problem 1 are of particular interest, 
a) Suppose that the variables of the ith gToup fall into two subgroups, and 
write for Z,,- : I/,,- when j = 1, ■■■ , ki ; Fiz-t, when j = A:,- + 1 , • • • , 
ki -|- Uiki 4* li = Si), Let 


Xij = 


c 


for j = 1, ki ; 

for 3 — ki + I, . ki + h. 


(3.7) 



NON-PARAMETRIC HYPOTHESES 


37 


Then the alternatives ascribe to the variables normal distributions with common 
variance and such that 


(3.8) E{Uu) = hi ; EiYi,) = 6.- + a. 

The criterion becomes 


^2 gj 2SZ,7 (x<,- Xi) 

or 

(3.10) 



li 

ki + U 


t^Ui, 


) 


= S :j : (y, - m.) 

ki ^ li 


2 I - {pi Wi) 

ki ^ li 1 


according as a is restricted to positive values or not. 

b) If we specialize still further and let m = 1, we are dealing with a problem 
which would coincide with the two sample problem if we added independence 
to the assumptions of the hypothesis. (3.10) becomes | i; — u j, the criterion 
suggested by Pitman [3]. 

c) If instead of m we set ki = i, = I for i = 1, • • • , m we are testing inter¬ 
changeability within each pair (ui , Vi) against normal alternatives under which 
the means of Ui and Vi are different, the difference being independent of i. 
The criterion | 2 (vi — Ui) | to which (3.10) reduces was first suggested by 
R. A. Fisher [1], 

d) As a last example set m = 1 in the original problem. Under the hypothesis 
the joint density of Zi, • — , is symmetric in its s arguments, while under the 
alternatives the Z^s are normally distributed with common variance and mean 
axi + b- The criterion reduces to | 2 {zi — z){x^ — x) [ which was proposed by 
Pitman [3]. 

We therefore see that several non-parametric tests which have been discussed 
in the literature are most powerful one-sided or most stringent for testing a 
hypothesis of invariance against certain classes of normal alternatives. In a 
later section we shall indicate to what extent these results remain valid if to 
these hypotheses we add the assumption of independence. 

The remaining problems will be considered somewhat more briefly since the 
proofs follow the same pattern as in problem 1. 

Proiilem 2. The conditions of problem Id) are satisfied in particular if 
Xi, ' • • yXa Sire values taken on by random variables Xi, • • • , A”, and if under 
the alternatives the pairs (Xi , Z,) have a common bivariate normal distribution 
with orj = . We are then concerned with a problem related to that of testing 

for absence of interclass correlation. For the corresponding intraclass problem, 
we consider random variables Xi, • • • , A",, Zi, • • • , Z,, and test the hypothesis 
that the joint density of the 2s variables is symmetric in all its arguments, 
against the alternatives that the pairs (Xi , Z,) have a common bivariate normal 
distribution, the means and variances of the X's and Z’s being the same. We 



38 


E. L. LEHMANN AND C. STEIN 


shall only consider the case of positive correlation. Clearly, the criterion will be 
2 XiZi as in the one sided case of problem (d). However the tests differ, in that 
this expression must now be compared not only with the si expressions obtained 
by permuting the; 2 's among themselves, but instead with the (2s)!/2''s! expres¬ 
sions obtained by considering all possible ways in which s pairs can be formed 
from the complete set of 2s observations. 

Problem 3. Consider once more the hypothesis that the joint density of 
, • • • , Zn is symmetric in its n arguments, and consider the alternatives that 
the Z^s are normally distributed with positive circular serial correlation. Then 

(3.11) giz) = C cxp|- 2 [(a:< - 0 - &ixi+\ - ~ 

where Zn-^i = Zi . The test based on this criterion, which was proposed by 
Wald and Wolfowitz [8], is therefore most powerful against the above class of 
alternatives. 

Problem 4. As a last problem, we shall test the hypothesis H that the joint 
density of Zi, • • • , Zn is symmetric in its n arguments and symmetric about 
each coordinate hyperplane, that is, invariant under the transformation 
Xi = —Xi , Xj = Xj for all j 9 ^ i, for f = 1, • * • , n. This will be tested against 
the alternatives that the Z’s are independently, identically distributed according 
to a normal distribution with non-zero mean. If we restrict this mean to positive 
values, we get 

= why ~ j ~ • 

If on the other hand both positive and negative values are allowed for the mean, 
the most stringent test is based on the statistic 1 2 Zf [ . 

This test may be appropriate for some situations in which it is customary to 
use the sign test. 

4. Binomial and other non-normal alternatives. In the present section 
we shall be concerned mainly with generalisations of problems lb) and Ic) of 
section 3. As described there, the hypotheses referred to the class of all proba¬ 
bility densities in the usual sense. However, as was pointed out at the end of 
section 2, the same tests may be considered as referring to much wider hypothe¬ 
ses. If they are interpreted in this way, it is possible to greatly widen the class 
of alternatives without destroying the optimum properties of the tests. 

Let Z = {Xi , • • • , Xn , Fi, • • • , Yn) and denote by n the partition under 
which two points z and z' are equivalent if they are obtainable from each other 
by a permutation of coordinates. Let Ho be the hypothesis of invariance under 
n. This is a generalization of the hypothesis of complete symmetry referring 
to a class of probability densities. Consider as alternative the class of distribu¬ 
tions defined by 

(4.1) P{Z € A} = f C exp {^i2a;i + + 2r(x,) + 2 r( 2 /i)} dii{z). 

Ja 



NON-PAKAMETRIC HYPOTHESES 


39 


where the are any real numbers, where ix is the 2/ith power of any one dimen¬ 
sional measure v (and therefore invariant under IT), and where r is any j^-measur- 
able function, subject only to the condition that the integral (4.1) converges 
when taken over the whole space. 

We first consider the one-sided case ^2 > . Using theorem 2 for a particular 

0iy 62 f r and /i, we then have 

g(z) = C exp {SiSx, + e^lyi + 2r(a:.) +lr(y,)} 

(4.2) ~ ei'Zxi + 02~ 01 Sx,- + 02Si/< — -^01 + 02)S[xi + ?/,] 

— ~ 02 ) [^x.- — 2y,] ~ S?/,- — 2xi. 

Since this test does not depend on 0i, 02, r or /i, it is uniformly most powerful 
against the one-sided class of alternatives 62 > di. 

Dropping the restriction 62 > 61 , we apply theorem 4 with Q the set consisting 
of the two points , ^2, r, g and ^2, , r, g. At these two points the envelope 

power function obviously takes on the same value. If for X we select the 
distribution, which assigns equal probabilities to both points, then 

g{z) exp {SiHXi + ^222/,} + exp [e2'^Xi + Si'Eyi} 

^ exp - ^ 2 )[ 2 Xi - Si/i]} + exp {K ^2 - - Si/Jj 

^ I -- 2 ^/, I I y - X I . 

The power of this test clearly is the same against both points of II. Since 
furthermore the test does not depend on the ^'s, r, or g, it is most stringent 
against Th . 

A univariate distribution such that 

( 4 . 4 ) P{A" € A} = f Ccxp {dx + r(x)} dv(x) 

Ja 

has been called Laplacian by Twoedie [ 9 ], who has studied these distributions 
in a different connection. iVmong others, the normal and x\ the binomial and 
Poisson distributions are Laplacian. To obtain, for example, the distribution 
of a characteristic variable, take for v the measure v* which assigns to a set D 
the values 0 . 1 or 2 according as D contains none, one or both of the points 
X = 0 and x = 1, and take as density the function 

( 4 . 5 ) p (1 - p)'“" = (1 - p)e^ 

For comparison with tests which have been considered in the literature, one 
can specialize the problem just considered, so that the hypothesis Ro and the 
class of alternatives Ri consist only of those members of Ho and 111 which are 
generalized densities with respect to a fixed measure g. One can specialize even 
further and take as alternative any subset of Ri provided with any point , 6 :, r, 
it also contains the point 62, 0 i j r. The test clearly will not change with these 
specializations, and the test based on ( 4 . 3 ) will therefore possess the same 



40 


E. L. LEHMANN AND C. STEIN 


optimum properties with respect to these special problems as with respect to 
the problem for which it was originally derived. 

If in particular one selects for v the measure p* mentioned above, one obtains 
the problem for which R. A. Fisher proposed the test based on (4.3). It follows 
that this test, Fisher’s exact test, is most stringent in connection with the 
following problem: The random variables Xi, • • - , An , Fi, • • • , Fn are 
characteristic variables, that is, they can take on only the values 0 and 1. If we 
let (4.6) P{Xi = xi, • • • , Fn = Vn] = P{xi , • • • , yn), the hypothesis states 
that the function P is invariant under all permutations of its arguments. An 
equivalent formulation is that the probability (4.6) depends only on llxi + ^yi, 
the total number of ‘^successes”. Fisher’s exact test is most stringent against 
the alternative that the A’s and F’s are samples from two distinct populations of 
characteristic variables, that is, two populations corresponding to distinct 
probabilities of success. 

Problem Ic) of section 3 can be extended quite analogously. Put again 
Z = {Xi , • • • , -Yn , Fi, • • • , Fn), and denote by n the partition under which 
two points z and 2 ' are equivalent provided they can be obtained from each other 
by a permutation of coordinates in which only the coordinates within pairs 
(Xi , Yi) are interchanged. Consider the hypothesis of invariance under n 
with reference to the class of all distributions and as alternative the class of 
distributions given by 

(4.7) P{Z € A} = j C exp 

The (^’s here are any real numbers, a* is the 2nih power of any one-dimensional 
measure v, and r is any j'-measurable function such that (a) the integral (4.7) 
converges when A is the whole space, and such that (b) r(x, y) = r(y, x). 

Clearly in the one-sided case O 2 > 61 we will again find g(z) 2 ?/,• — S Xi ^y'-x, 
so that the associated test is uniformly most powerful against this one-sided 
class of alternatives, while the test based on [ y — x 1 is again most stringent 
against the full alternative Hi . 

The class of distributions (4.7) contains the distributions (4.1) as a special 
case. If (Xi , F<) z = 1, • • • , n is a sample from a bivariate normal distribution 
with <7x = ory , we get another case of (4.7). 

As a last somewhat more special problem we mention a discrete analogue of 
problem 4 of section 3. Let Z = (Zi, • • • , Zn) and consider the class of 
generalized densities given by 

(4.8) P{ZfAl=f^P(zi,--.-,Zn)d^^(z) 

where n is the nth power of v*. Let Ho be the hypothesis that P is invariant 
under permutations of the coordinates and under the group generated by the 
transformations z,- = 1 — , Zj = Zj j 9 ^ t for f = !,•••, n. This is an 

extension of the hypothesis that the probability of success in a binomial dis- 


2 + hVi + r(x,, y,)]f d fi ( 2 ) 



NON-PARAMETRIC HYPOTHESES 


41 


tribution equals The test of Hq against the alternatives that Zi, • • • Zn 
is a sample of a characteristic variable is based on 2 Zi or | S 1 as = 1} is 
restricted to be greater than i or is not so restricted. In the first case the test is 
most powerful, in the second most stringent. 


5. Hypotheses of invariance for independent variables. To the results ob¬ 
tained so far, a different interpretation can be given, which throws some light on 
certain related problems. Theorem 2 gave sufficient conditions for a test to 
be most powerful against a simple alternative Hi for the hypothesis Hq of 
invariance under a partition n. However, if taken in conjunction with section 
1, the theorem can be intepreted as giving sufficient conditions for a test to be 
the most powerful test of structure <S(€) with respect to II against Hi. That 
is, the theorem is really independent of the hypothesis, and depends solely 
on the alternative and on the class of tests admitted into competition, in 
our case the class of all tests having structure ^(c) with respect to H. The 
same remark obviously also applies to most stringent tests. 

Let us now consider a special class of partitions. Let Z stand for the m 
groups of random variables {Zn , • • • , (i = 1, • • •, m) and let 11 denote the 
partition under which two points z and z' are equivalent provided they can be 
obtained from each other by a permutation of coordinates which however 
permutes only the coordinates within the m groups. Let m be the power of a 
one-dimensional measure v, and assume that the probability distribution of Z 
is absolutely continuous with respect to m and that the Z's are independently 
distributed, so that 

(5.1) P{Z«A} = f dp(zii)]. 

Ja %,j 


Under these assumptions consider the hypothesis H that /»; is independent of j, 
that is, that the Z’s are identically distributed within each group. It easily can 
be shown that not all admissible tests of H that have size c, have structure S{e). 
However a generalization of a result of Feller [10] and Scheff6 [5] for the case 
m = 1 and /x = Lebesgue measure, states that the only tests which are of size « 
and similar for H, are the tests of structure S{€) with respect to n [11]. It 
follows that any test which is most powerful or most stringent for testing the 
hypothesis H' of invariance imder n for the class of generalised densities with 
respect to m, has the same property relative to the class of all tests which are 
similar for testing H. 

As an example, take problem lb) of section 3. Here m is Lebesgue measure, 
m is 1, and we put 


(5.2) 



for y = 1, • • • , k 

for j — k +' I, k + I = s. 


It was shown in section 3 that the test based on 1 w — v | , Pitman's test, is most 
stringent for testing the hypothesis that the joint density of the U's and F's is 



42 


E. L. LEHMANN AND C. STEIN 


symmetric in its fc + i arguments against the alternative that the variables are 
independently normally distributed with common variance and such that 
E{Ui) = E{Vi) = rj where f and rj are any distinct real numbers. It follows 
now that the same test is most stringent similar for testing against the same class 
of alternatives the hypothesis that Ui, • • • ,UkyVi, • • • , are independently 
distributed, all with the same probability density. This is the hypothesis for 
which Pitman proposed his test, and the result just stated is a partial solution 
of the problem recently raised by Wilks [12], to determine the class of alternatives 
for which Pitman’s test is satisfactory. 

If we modify the example by taking for instead of Lebesgue measure the 
k + Zth power of the measure i'* of section 4, we are dealing with characteristic 
variables ?7i, • • • , J/a: , , • • • , Fi. We have shown earlier that if & = Z 

the test based on j w — i; | is most stringent for testing the hypothesis of complete 
permutability against the alternative that the C/’s and 7’s are samples from two 
distinct populations of characteristic variables. If we add to this hypothesis 
the assumption of independence of all variables, we obtain a parametric problem, 
namely essentially the problem of testing equality of probability of success in 
two binomial populations corresponding to the same number of trials. It now 
follows that the test based on j tZ — t; 1 is most stringent for this problem. As is 
well known, it is also the uniformly most powerful, unbiased similar test. 

These two examples suffice to illustrate the type of result that can be obtained. 
It should perhaps be mentioned that the equivalence discussed at the beginning 
of this section, can be utilized also in the opposite direction. The fact, for 
example, that the test based on | w — v | is known to be uniformly most powerful 
unbiased similar for testing equality of probability of success in two populations 
of characteristic variables from which the C/’s and F’s arc samples, proves that 
this test is uniformly most powerful unbiased for testing the hypothesis of 
complete symmetry for the joint generalized density of the C/’s and F’s. 

6. Extension to infinite equivalence classes. The definition of a hypothesis 
of invariance given in section 1—in spite of the restriction to finite equivalence 
classes—was sufficiently general to cover the non-parametric problems that we 
wanted to study. It is possible however to extend the definition so as to allow 
infinite equivalence classes. In this concluding section we shall briefly outline a 
theory based on such a broader definition. This will enable us to point out a 
relationship between the approach of the present paper and the standard 
parametric theory. 

Let 2 be a space of points z and Cf an additive class of subsets of 2. We 
define a partition of 2 into subsets {>S4 as follows: Let ^fbe some space, and 
for each i e .Tlet St be a measurable subset of 2 (i.e. an element of (?) such that 
the St are mutually exclusive and exhaustive. Let Co be the class of all Co eQ 
which can be expressed in the form 

( 6 . 1 ) Co = U St 



NON-PAIIAMETRIC HYPOTHESES 


43 


and let 9)o be the class of all Do occurring in such relationships. For each 
t € let Gt be a specified probability measure over tit , where Cti is the class of 
At such that At e St ^ At eCf. Let Z be a random variable distributed over Z 
according to an unknown probability measure F. Let \l/{z) be that t € .T for 
which z € St, and let T = ^(Z). Let IIo be the hypothesis that for each t € fT 
the conditional distribution of Z given Z e St is Gt j i.e. that there exists a proba¬ 
bility measure Qo over (?o such that for all A ed 

(0.2) F(A) = J Gt(A n S,) dQoit). 

It is seen that we have essentially the situation described in section 1, except 
that there we assumed further that each St was finite and for all Gt assigned 
equal probabilities to all points of St . 

We say that a test (p of Ho has structure S(€) if the conditional expectation 
Et[(p{Z)] of (p{Z) given Z c St satisfies 

(0.3) Et[<p (z)] = f (p dGt = € for all t 

Jst 

The lemmas and theorems stated below are straight-forward generalizations 
of those in section 1 so that no proof will be given. 

Lemma 1'. A?iy test <p of structure S{e) with respect to IIo is similar and of 
size € for Ho . 

Lemma 2 . If (p is amj test of Hq of size < €, there exists a test (p, of Ho having 
Uructnre S(€) and such that 

(6.4) J (Pi dF ^ J (p dF 

for all probability measures F, for which the conditional distribution of Z given 
Z € Si is absolutely continuous with respect to Gt for all t. 

Suppose next there is defined another partition of Z into sets {aSI} by means 
of a space ^II, and let , ‘i)! and Qu refer to this second partition. We shall 
assume that for every t € Tf, u e either Su C St or S'u D St is empty. Let Gu 
be a specified probability measure over Cf„ and suppose that for each t € T 
here exists a probability measure Qt such that for all At e CL 

(6.5) GtiA,) = I GUA, n S'u)dQ,(ii). 

If III denotes the hypothesis that for each u e the conditional distribution of 
Z given Z e Su is Gu , we can state 

Theorem For testing Ho against Hi at level of significance e, the totality of 
tests (p which have structure *S(c) and for which z, z e Sl implies (p{z) = (p{z) form 
an essentially complete class of admissible tests. 

Let Fi be a distribution not in Ho , and for each t e fi let Gu be the conditional 
distribution of Z given Z e St . We suppose that for each t c ff, Gu is chosen 
to be a true probability measure, which is possible in most cases of practical 



44 


E. L. LEHMANN AND C. STEIN 


interest (see Doob [13] for a discussion of this point). Then we have the equiv¬ 
alent of theorem 2: 

Theorem 2'. Let 


(6.6) = f g, dGt -f (hMt fl H,) 

JAt-Bi 

for aU At C. St, where in accordance with the Radon-Nikodym Theorem [14], gt 
is a non-negative function integrable over St, and Ht C, St has Gt measure 0 and 
does not depend on At. For testing Ho against Ht, a most powerful test of size € 
is given by <p{z) — ipt{z) for zt St where 


(6.7) 


<Pt{z) 


1 if Z€ Ht 
1 if 9t{z) > Ct 
|o« if gt(z) = Ct 


lO if gtiz) < Ct 


where Ct and at are so chosen that has structure S{t). 

Theorems 3 and 4 require no modification. 

As in the case of finite equivalence classes the results just outlined can be 
interpreted differently. Again the theorems are really independent of the 
hypotheses, but depend only on the alternatives and on the class of tests admitted 
into competition. This class of tests ^ is in the present case defined by condition 
(6.4), that the conditiona expectation of ^ given Z s St equals e. But this is 
just the condition which in the standard approach to the problem of testing a 
composite parametric hypothesis for which T is a sufficient statistic, by means 
of similar regions is frequently found to be the necessary and sufficient condition 
for <p to be similar. (See for example [15]). For these cases therefore the 
hjqjotheses of the present section represent non-parametric analogues to which 
the same tests apply with the same optimum properties but without the a priori 
restriction to similar re^ons. 

As a simple illustration of this remark, let Z = (Zi, • • • , Z„), and let 

n 

T = For the conditional distribution of Z given T = t take the uniform 

distribution over the sphere T = t, and for u take Lebesgue measure. Then the 
hypothesis H states merely that the joint probability density of the Z’s is a 

n 

function only of ^ Z?. If we add to this the assumption of independence of 

the Z’s, we obtain the new hypothesis H' that the Z’s are a sample from a 
normal distribution with zero mean. The tests <p for which the conditional 
expectation over each sphere is e, constitute'the only admissible tests of H 
and the only admissible similar tests of H'. If as alternatives we consider that 
the Z’s are a sample from a normal distribution with mean { > 0, the test 



NON-PARAMETRIC HYPOTHESES 


46 


is uniformly most powerful for H and uniformly most powerful similar for 
If we do not restrict f to positive values, the test 


(6.9) 


VsCaJi - xY 


>C", 


Student^s test, is uniformly most powerful unbiased and most stringent for 
testing uniformly most powerful unbiased similar and most stringent similar 
for testing 


REFERENCES 

[1] R. A. Fisher, Design of Experiments, Oliver and Boyd, Edinburgh, 1935. 

[2] J. Netman, K. Iwaskiewicz and St. Kolodziecztk, * ^Statistical problems in agri> 

cultural experimentation,” Roy. Stat. Soc. Jour. Suppl., Vol. 2 (1935), p. 107. 

[3] E. J. G. Pitman, “Significance tests which may be applied to samples from any propor¬ 

tion, Roy. Stat. Soc. Jour. Suppl., Vol. 4 (1937), p. 119; II. The correlation 
coefficient test, Roy. Stat. Soc. Jour. Suppl., Vol. 4 (1937), p. 225; III. The 
analysis of variance test, Biometrika, Vol. 29 (1938), p. 322. 

[4] E. L. Lehmann and C. Stein, “Most powerful tests of composite hypotheses. I. 

Normal distributions,” Annals of Math. Stat., Vol. 19 (1948). 

[5] H. ScHEFF^, “On a measure problem arising in the theory of non-parametric tests,” 

Annals of Math. Stat., Vol. 14 (1943), p. 227. 

[6] A. Wald, “An essentially complete class of admissible decision functions,” Annals 

of Math. Stat., Vol. 18 (1947), p. 549. 

[7] G. Hunt and C. Stein, “Most stringent tests of statistical hypotheses,” unpublished. 

[8] A. Wald and J. Wolfowitz, “An exact test for randomness in the non-parametric 

case, based on serial correlation,” Annals of Math. Stat., Vol. 14 (1943), p. 378. 
[9J M. C. K. Tweedie, “Functions of a statistical variate with given means, with special 
reference to Laplacian distributions,” Cam. Phil. Soc. Proc., Vol. 43 (1947), 
p. 41. 

[lOJ W. Feller, “Note on regions similar to the sample space,” Stat. Res. Memoirs, Vol. 2 
(1938), p. 117. 

[11] E. L. Lehmann and H. Scheff£, “Completeness, similar regions and unbiased estima¬ 

tion,” unpublished. 

[12] S. S. Wilks, “Order Statistics,” Am. Math. Soc. Bull., Vol. 54 (1948), p. 6. 

[13] J. L. Door, “Asymptotic properties of Markoff transition probabilities,” Trans. Amer. 

Math. Soc., Vol. 63 (1948), footnote p. 399. 

[14] S. Saks, Theory of the Integral, Stechert, 1937. 

[15] J. Netman and E. S. Pearson, “On the problem of the most efficient tests of statistical 

hypotheses,” Roy. Soc. London Phil. Trans., Ser. A., Vol. 231 (1933), p. 289. 

[16] A. Wald, On the Principles of Statistical Inference, Notre Dame Mathematical Lectures, 

Number 1. 



ESTIMATION OF THE PARAMETERS OF A SINGLE EQUATION IN A 
COMPLETE SYSTEM OF STOCHASTIC EQUATIONS' * 

By T. W. Anderson® and Herman Rtjbin^ 

Columbia University and Institute for Advanced Study 

1. Summary. A method is given for estimating the coefficients of a single 
equation in a complete system of linear stochastic equations (see expression 
(2.1)), provided that a number of the coefficients of the selected equation are 
known to be zero. Under the assumption of the knowledge of all variables in 
the system and the assumption that the disturbances in the equations of the 
system are normally distributed, point estimates are derived from the regressions 
of the jointly dependent variables on the predetermined variables (Theorem 1). 
The vector of the estimates of the coefficients of the jointly dependent variables 
is the characteristic vector of a matrix involving the regression coefficients 
and the estimate of the covariance matrix of the residuals from the regression 
functions. The vector corresponding to the smallest characteristic root is 
taken. An efficient method of computing these estimates is given in section 7. 
The asymptotic theory of these estimates is given in a following paper [2]. 

When the predetermined variables can be considered as fixed, confidence 
regions for the coefficients can be obtained on the basis of small sample theory 
(Theorem 3). 

A statistical test for the hypothesis of over-identification of the single equation 
can be based on the characteristic root associated with the vector of point 
estimates (Theorem 2) or on the expression for the small sample confidence 
region (Theorem 4). This hypothesis is equivalent to the hypothesis that the 
coefficients assumed to be zero actually are zero. The asymptotic distribution 
of the criterion is shown in a following paper [2] to be that of x- 

2. A complete system of linear difference equations. In many fields of study 
such as economics, biology, and meteorology the occurrence of values of the 
observed quantities can be described in terms of a probability model which, as a 
first approximation, is a set of stochastic equations. Consider a (row) vector yt 
of quantities which are observed at time L Suppose that these quantities are 
jointly dependent on a vector Zt of quantities “predetermined” at time t (i.e., 
known without error at time t) . Some of the coordinates of Zt may be coordinates 

^ This paper will be included in Cowles Commission Papers, New Series, No. 36. 

* The results in this paper were presented at meetings of the Institute of Mathematical 
Statistics in Washington, D. C., April 12,1946 (Washington Chapter) and in Ithaca, N. Y., 
August 23, 1946. 

* Fellow of the John Simon Guggenheim Memorial Foundation; Research Consultant 
of the Cowles Commission for Research in Economics. 

* National Research Fellow; Research Consultant of the Cowles Commission for Re¬ 
search in Economics. 


46 



ESTIMATION OF PARAMETERS 


47 


of yt-i , yt~ 2 , etc.; other coordinates of Zt are quantities which are assumed given 
constants. The set of vectors yiit = 1, 2, • • • , T) are called endogenous. The 
part of the set Zt which does not consist of lagged endogenous variables is called 
exogenous] these are treated as “fixed variates.” For convenience we shall 
think of i as indicating a point of time, although it may in many cases indicate 
the ordering of a sample in another dimension, or, indeed, the t may indicate 
simply a numbering of the observations (if Zt is entirely exogenous). In a 
dynamic economic model the endogenous variables are economic quantities 
such as amount of investment, interest rate, amount of consumption, etc. The 
exogenous variables are those quantities which are considered to be determined 
primarily outside the economic system, such as amount of rainfall, amount of 
government expenditures, time, etc. 

A simple probability model may be set up on the assumption that these 
quantities approximately satisfy certain linear equations. Specifically the model 
is 

(2.1) Byyy\‘+ VyuZt == U 


where u is a (row) vector having a probability distribution with expected value 
zero and Byy and Fy, arc matrices, the former being non-singular. Primes (') 
indicate transposition of vectors and matrices. If there are G jointly dependent 
variables, there are G component equations in (2.1); that is, there are as many 
equations as there are variables depending on the system. The fact that yt 
and Zt do not satisfy linear equations exactly is indicated by setting the linear 
forms not equal to zero, but equal to random elements, called disturbances. 
We will call the component equations of (2.1) structural equations, for they 
express the structure of the system. For example, one equation involving the 
amount of goods consumed, the prices of these goods, the size of the national 
income, etc., might describe the behaviour of the consumers. Another equation 
involving interest rate might relate to the behaviour of investors. 

It has been shown [7], [11], that in general one cannot use ordinary regression 
methods to estimate the matrices Byy and Py, and the parameters of an assumed 
distribution of the disturbances. Mann and Wald [9], for a special class of 
systems, and Koopmans, Rubin, and Leipnik [11], in a more general case, have 
obtained maximum likelihood estimates of all of the parameters for the case of 
the €t having a normal multivariate distribution. 

Since Byy is non-singular, we can rewrite (2.1) in a different form, called the 
reduced fornij 

( 2 . 2 ) ijt = + Byiu, 

or as 

(2.3) y\ = IW'^ + n't 


where 

(2.4) n,. = -B’^Ty ., 

(2.5) -n't = B^yi't. 



48 


T. W. ANDERSON AND HERMAN RUBIN 


If 6( has a normal distribution, so does . For a given t then, we can consider 
the model as specifying a distribution of yt with conditional expected value ZtTly,. 

It is clear that we can multiply (2.1) on the left by any non-singular matrix 
and obtain a system of equations which defines the same distribution of yt. 
On the other hand, it has been shown that the only transformations of (ByyTyt) 
which preserve the linearity of the system of equations are multiplications on the 
left by non-singular matrices. If there are a priori restrictions on 
the set of matrices which result in new coefficient matrices satisfying these 
restrictions is correspondingly decreased. If the set of admissible matric 
multipliers includes only diagonal matrices the system of structural equations 
is said to be identified. In this case only multiplication of all coefficients by a 
given constant is permitted. 

Knowledge of the distribution of yt given Zt is obviously equivalent to knowl¬ 
edge of Ily, in (2.3) and the distribution of . When the system is identified, 
the matrix Byy and 

(2.6) r»» ~ ~By^yt 

are determined uniquely except for multiplication on the left by a diagonal 
matrix. Thus identification of a system is equivalent to the possibility of 
inferring the structural equations from knowledge of the distribution. The 
estimation of all coefficients of Byy and Ty, has been considered in [11). 

3. A single identified equation of a complete system. In many studies the 
investigator may be interested only in a specific equation of the system, say, 

(3.1) /3y3/J + y,g't = ft, 

where ft is a scalar disturbance. The investigator may not be interested in the 
entire system (2.1) of which (3.1) is one component. Since a considerable 
amoimt of computation is necessary to estimate all parameters of a complete 
system, there arises the problem of estimating only the coefficients of a single 
equation. It is desirable to do this with the least possible restrictive assumptions 
about the part of the system which is not the selected structural equation. In 
order to treat the selected equation at all, we require that it is identified; that is, 
that there are certain restrictions on (0y, y,) such that no linear combination of 
rows of (ByyTy,) satisfies these restrictions other than a constant times (/3y , y,). 
It is not necessary to assume that every component equation is identified; that is, 
that the entire system is identified. 

We shall suppose that the restrictions imposed are that certain coefficients 
are zero. We can arrange the components of the vectors so that the restrictions 
are 

(3.2) ()3,,7.) “ 08,0,7,0), 
where 

(3.3) 


/3 = (^ , ... , iS*) 



I!STISIA.TION OF PARAMETERS 


49 


has H coefficients not assumed to be zero and 

(3.4) 7 = ( 7 *, • • • , y') 

has F coefficients not assumed to be zero. 

It will be convenient to divide the G components of yt into two groups (in 
number H and G — H, respectively), and the K components of into two groups 
(in number F and D respectively) according to whether or not the components 
enter into (3.1) with coefficients not assumed to be zero. Let 


(3.5) 

(3.6) 
where 

(3.7) 

(3.8) 

(3.9) 

(3.10) 

Then the selected equation is 

(3.11) 


yt = {xt , n), 

Zl = {ui,Vt), 


Xt = {Xii , 
rt = (ra , 
Ut = (Uti , 
Vt = (vti, 


» Xta), 

, ft.a-a), 

j W<r), 

, 1'<d). 


fixt + yu't = . 


Now let us see how the identification is accomplished. Partitioning Ily, into 
H and G — H rows and F and D columns as 


IIv, 


_ ( II*U IIz*\ 

'• “ \n„ n„/, 


x't = + n.,!;; + 6t, 


we can write the reduced form (2.3) as 

(3.12) 

(3.13) r; = + 

where 

Vt = (5< , $«)• 

Multiplying the above equation with (/3, 0) we obtain 

(3.14) 0x, = mtuUt + m^vv't + ff&'t . 

Since this must be identical to (3.11) we must have 

(3.15) 7 = , 

(3.16) 0 = -/3II„. 

The matrices Illl'u and n„ are defined by the distribution of Xt given u« and Vt 
(for at least K — D + F linearly independent values of , Vt). The equation 



50 


T. W. ANDERSON AND HERMAN RUBIN 


(3.11) is identified if and only if the solution of (3.15) and (3.16) for and y is 
unique except for a constant of proportionality. This depends on the rank of 
ITxv being — 1 . Thus a necessary and sufficient condition that (3.11) is 
identified is that the rank of Xt on Vtbc H — 1 . In particular this implies that 
the number of coordinates of Vt (the number of zero coefficients in 7 ,) be at least 
H — 1. It can easily be shown that this condition is equivalent to requiring 
that the rank of the matrix obtained by selecting the G — H columns of Byy 
and the D columns of Fj,* corresponding to the coefficients assumed zero in the 
selected equation is C? -- 1 . This is the condition given by Koopmans and 
Rubin [ 11 ]. Other homogenous linear restrictions can be put in this form. 

If the vector €< is normally distributed with mean zero the vector rjt is normally 
distributed with mean zero. Let the covariance matrix of be 0*,. Then 
the variance of is 

(3.17) a = 

The constant of proportionality in may be determined by setting the variance 
of f <, <r^, = 1 ; another normalization is 

(3.18) = 1 , 

where is the ith coordinate of iS. In general the normalization can be written as 

(3.19) = 1, 

where ^xx can be either a known constant or can be a known function of unknown 
parameters. 

As an estimation procedure for and 7 and 2 ) == /I — 1 , M. A. Girschick 
suggested in an unpublished note that one solve equations (3.15) and (3.16) 
with (IIxu, IIxv) replaced by {PtuPxv), the sample regression of a; on w and v. 
By these means Girschick found confidence regions (see section 8 ) for the 
parameters of a two equation system. A similar idea lies behind a method of 
0. Reiers 0 l [10]. 

The present paper develops a method for handling the case of D > H, In 
this case the rank of Pxv is usually H, thus giving no admissible estimate of /3. 
The proposed method follows the approach used in discriminant problems. 

In a second paper [2] the present authors shall give asymptotic properties of 
these estimates that give a certain justification for the use of them. Under 
very general assumptions concerning the Vt and the et we prove that these 
estimates are consistent. These hypotheses permit the investigator to neglect 
some predetermined variables absent from his particular equation. Alternative 
assumptions include the case of the other G — 1 equations being non-linear. 
Finally, it is shown that the estimates are asymptotically normally distributed. 
For this result it is not necessary to assume that the disturbances are normally 
distributed, or even that they have identical distributions. 

4. A description of the estimation procedure. In a sense the dependence 
of the endogenous variables Xt on the predetermined variables Ut and Vt is given 



ESTIMATION OF PARAMETERS 


51 


by the matrix (11*,* II^v) of regression coefficients of Xt on Ut and Vt . The 
interdependence of the coordinates of Xt indicated by the selected equation 
nullifies the dependence on Vt ; that is, 

(4.1) = 0. 

Suppose we wish to estimate 0 and y from a sample of T observations: 
{xi, Zi), (x 2 , Z 2 ), • • • (xt , zt)^ The information we need can be summarized 
in the second order moment matrices 


(4.2) = 

1 <-i 



Since one coordinate of may be unity there is no advantage in taking these 
moments about the mean. We shall find it more convenient to use instead of 
Vt the part of Vt that is orthogonal to ; that is, we shall use 

(4.5) si = ai — MvuMZuu't . 

The moments are then Mxx, Mxu , Muu , 

(4.6) Mx, = Mxv — MxuM'ZuMy.v , 
and 

(4.7) M„ = Mvv — . 

We can express the reduced form as 

(4.8) x't = Uxuu't + n„si + S't, 


where 

(4.9) 


Uxu = + UxvMxuM^il , 

11x4 = Harv • 


An estimate of IIx, is the regression of x on s, 

(4.10) Px, = . 

To estimate we take the that makes /SF*. smallest in the metric determined 
by the moment matrix of the residuals 

Wxx = Mxx - P«M„Px, - PxuMuuP'xu , 


(4.11) 



52 


T. W. ANDERSON AND HERMAN RUBIN 


where 

(4.12) . 

This is the natural generalization of least squares; the greatest weight is given 
to the component with least variance. This estimate is the vector satisfying 

(4.13) - vW.:)V = 0 
which is associated with the smallest root of 

(4.14) 1 - pW,, I = 0 . 

This is normalized and the estimate of y is —hPxu . 

In section 5 we derive these estimates by the method of maximum likelihood 
under certain assumptions. Although it is assumed that the disturbances are 
normally distributed for this derivation, the estimates can be used in more 
general situations. This theory is in one sense a special case of the theory of 
estimating a matrix of means of a given dimensionality which is an extension 
of the discriminant function theory [5]. For an application of this method of 
estimation see [6]. 

6 . Derivation of maximum likelihood estimates* We derive the estimates of 
i9, 7 , and <r^ under the following assumptions: 

Assumption A. The selected structxiral equation 

(3.11) fixt + yu\ = U 

is one equation of a complete linear system of G stochastic equations. The equation 
is identified by the fact that if H is the number of coordinates in Xt there are at least 
jHT — 1 coordinates in Vt , the vector of predetermined variables not in (3.11) but 
in the system. 

Assumption B. At time t all of the coordinates of Zt = {ut , Vt) are given. 
Assumption C. The coordinates of Zt are given functions of exogenous variables 
and of coordinates of yt-i , , • * * . If coordinates of yo, y-i, • • • are involved 

in Zt , they will be considered as given numbers. The moment matrix M„ is non¬ 
singular with probability one. 

Assumption D. The disturbance vectors bt are distributed serially independently 
and normally with mean zero and covariance matrix ftxx. 

W^e shall consider normalizations (3.19) where ^xx may be a function of other 
parameters, but 

(5.1) d^xx/dp = 0. 

We can state the results in a theorem: 

Theorem 1. Under assumptions Ay B, Cf and D the maximum likelihood 
estimaie of is 

(5.2) 


4 = b/VHz.b', 



ESTIMATION OF PARAMETERS 


53 


where b is the solution of 

(4.13) - vW^,W = 0 

corresponding to the smallest value of v and P« is defined by (4.10), by (4.6), 
and Wxx by (4.11). An estimate of y based on the maximum likelihood estimate 
is given by 

(5.3) i = -$P^ , 
where Pxu is given by (4.12). The estimate of a is 

(5.4) = (1 + v)fbixJ>' 

if 

(5.5) bWb' = 1. 

We apply the method of maximum likelihood to 

(5.6) L = (2t)"‘’’® 1 p' exp|-i E (xt - n;.)n7;(®; - 

under the restrictions (4.1) and (3.19). Replacing Vt by St and adding (4.1) 
and (3.19) multiplied by Lagrange multipliers X (a vector of D coordinates) and 
respectively to the logarithm of L we obtain after division by T 

4 log 2ir + i log I a7x* 1 + |8n„X' + - 1) 

«,n'„ - s,u'„)^i:fix', - - Hx,®;). 

Differentiating (5.7) with respect to jS, we obtain 

(5.8) -.-j = nx.X' +20$,.|8'. 

op 

Setting this equal to zero and multiplying by /3, we have 

/3nx.x' + a^/s^xxjS' = 0. 

By virtue of (4.1) and (3.19), the Lagrange multiplier ^ must be zero. Hence, 
as far as the derivarives of (5.7) are concerned the restriction (3.19) does not 
enter. The setting of the derivatives of (5.7) equal to zero and (4.1) will define 
$ except for a constant of proportionality which is finally determined by (3.19). 
For convenience in deriving the estimates we shall use the normalization 

(5.9) jSflxxiS' = 1. 

The derivatives of (5.7) with respect to the coordinates of Ha , IIxu , Hxj, and 
jS are set equal to zero, resulting in 

ftx. = Mxx - MatlL - MjlL - flx.M„ 

- ftxji/»x + , 


(5.10) 



54 


T. W'. ANDERSON AND HERMAN RUBIN 


(5.11) (i7z\M.s - lix.A/«) + = 0, 

(5.12) Cl-z\M.u - nx„M„„) = 0, 

(5.13) = 0. 

Solving (5.12) for , we obtain 

(5.14) nxu = P«, 
defined by (4.12). Solving (5.11) for ft*., we obtain 

(5.15) fl« = 

Multiplying (5.15) by ^ and solving for X, we obtain 

(5.16) % = . 

Substitution into (5.15) gives 

(5.17) fix. = (/ - M'4)P«. 

In view of (5.14) and (5.17) we can write (5.10) as 

(5.18) = W„ + . 

Let 

(5.19) $P..M»P'J' = M. 

Then multiplication of (5.18) on the right by $' with use of (5.9) gives 
M' = Wj' + ClJ'^P„M„P'J' 

= Wj' + ndj'. 


that is, 
(5.20) 


XJ' = Wj'. 

1 — M 


Equation (5.13) can be written as 


(5.21) 


P^M„PL^' - M^2xx^J' = 0 


by substitution from (5.16), (5.17) and (5.19). Combining (5.20) and (5.21) we 


obtain 

(5.22) 

where 


(Px,M..P(. - vW^)^' = 0, 


(5.23) 1 / = m/(1 - m). 

For (5.22) to have a solution, v must be a root of 
(4.14) 1P«M„P(. - ./Ifxxl = 0. 

Substituting from (5.20) into (5.18) we obtain 

(5.24) + n (f^J = Wsr + K1 + 



ESTIMATION OP PARAMETERS 


55 


To determine which root of (4.14) to use we shall compute the value of the 
likelihood function when these estimates are used. It will be convenient to use 
the solution b of (4.13) with normalization (5.5). Thus b is proportional to jS; 
in fact, since 

1 — M 

from (5.20), we see that 

d = b\/1 — IX = b/y/1 + >-. 


Let the other solutions of (4.13) be ^ 2 , • • • , 5a, with corresponding roots 
**2» ■ ■ ‘ ( Pa ) nnd 


(b ] 

62 


B* = 


[b,,^ 


Since 

(5.25) 1 iXs I = I + uWXb'bW,, I, 
we have 

(5.26) I 5* II 11 /!*' I = I ^ + pB*W„b'bW„B*' | . 

Since 

bWs.B*' = (1, 0, • • • , 0), 

and since 

we deduce from (5.26) 

1 Xs 1 = 1 in* 1 (1 + p). 

Multiplying (5.10) by flji , taking the trace, and substituting in (5.6) we obtain 

(5.27) L = (2x6)"^^" I 1F„ I -‘'’(1 + vT^\ 

This is a maximum if v is the smallest root of (4.14). 

The theorem now results. The expression for a~ follows from 

a = = bXzb'M^J)'. 

If #,* is a known constant matrix, ; if is a function of the param¬ 

eters, is the same function of the estimates. 



56 


T. W. ANDBBSON AND HERMAN RUBIN 


If we define 

(5.28) i = -m^u, 
we have by (4.9) 

(5.29) i = 

Since $ annihilates ft„ , (5.3) results. 

The estimate of n„ is given by (5.17) and the estimate of is 

(5.30) + uW,MW„ . 

6 . The likelihood ratio test of restrictions. It has been assumed that the 
selected structural equation is identified by imposing the restrictions that certain 
coefficients are zero. It was noted in Section 3 that at least G — 1 such restric¬ 
tions are necessary. If D, the number of restrictions on the predetermined 
variables, is more than H — 1, we can test the hypothesis that these D coefficients 
are zero against the alternative that only a smaller number are zero. This is 
equivalent to a test that n„ is of rank H — 1 against the alternative that the 
rank is H. 

It can be seen intuitively that the smallest root v of (4.14) indicates how near 
Pz, is to being singular. This statistic can be used to test the hjrpothesis that 
n„ is of rank H — 1. The test is similar to the test of rank suggested by P. L. 
Hsu [8]. The test is stated precisely in the following theorem: 

Theorem 2. Utidcr assumptions A, B, C, and D the likelihood ratio criterion 
for testing the hypothesis that n„ is of rank H — I against the alternative that it is 
of rank H is 

.(6.1) (I + r)-‘^ 

where v is the smallest root of (4,14). 

Proof. If there is no restriction on Hx#, the maximum likelihood estimate of 
Hx, is PxM , of Ilxtt is Pxu , and of Uxx is Wxx . Then the likelihood function is 

(6.2) (2x6)-*’’* 1 Wzz 1 

The ratio between this and the likeliliood function (5.27) maximized under the 
hypothesis that the rank of II^ is H — 1 is (6.1). 

It is proved in the paper following the present one that under certain conditions 
(more general than those of Theorem 2) 

(6.3) -2 log [(1 -f = T log (1 -1- r) 

is distributed asymptotically as x* with D — H + I degrees of freedom. Thus 
an approximate test of significance is given by comparing (6.3) with a significance 
point of the x*-distribution with degrees of freedom equal to the excess number of 
coefficients required to be zero (i.e., the number beyond the minimum required 
for identification). 



ESTIMATION OP PAIIAMETKHS 


57 


7. Computational procedure. The estimation procedure in sections 4 and 5 
does not indicate the most efficient method for computing those estimates. The 
procedure given here is believed to be efficient for ordinary computational equip¬ 
ment and can easily be adapted for sequence-controlled computing machines. 

Let us see what expressions occur in the estimation procedure for 0 and 7 . 
We find that we must first know PxsMssPL , Wxx , and Pxu ; these will suffice 
if is constant or Qxx to estimate 7 , and In what follows, we shall 
assume the normalization is = I, as the results for other normalizations 
follow immediately. Examining the estimation equations, we see that we may 
use any matrices proportional to the moment matrices. If equation (3.11) 
has a constant term, it is better to use moments about the mean and estimate 
the constant term by setting the calculated mean of the disturbances equal to 
zero. One possible method of correcting for the mean is to calculate 

(7.1) mp 4 = T piqt - 7') ’ 

The estimation procedure for <r“, and the remainder of 7 is not affected by 
correcting for the mean. The computational procedure indicated here is 
unchanged except for a factor of proportionality in the equation for if a 
different form of correction for the mean is used. 

7.1. Calculation of Mx$M7tMgx and Wxx . It is known that 

(7.2) W„ = . 

We shall use (7.2) to compute Wzz • We shall compute M„M7,Mgz by the 
method given by Dwyer [4]. Let us denote the element in the ith row and 
jth column of M„ by o,-,, and the element in the tth row and jth column of 
M,z by bij . Let us construct the following array 


CllCv 2 

• • • CiK ^11 C12 

• * • 

diidi 2 

• • • dlK fll fl2 

••• flB 

C22 

• • • C 2 K C21 C22 

• • • 

dn 

• • • d 2 K /21 fn 

• • • f2H 


Ckk €ki€k2 

• • • €kh 


dxK /xi/x2 

• • • fxH 


Cij — Oij 

^ ^ dki Ckj } 
k<i 

VI 

VI 

VI 

6ij = hij "" 

^ V dki Cjci , 
k<i 

\ <i<k,\ < j < H 

d - = — 

Cii * 


1 <i <3 < K, 

Jii ~ > 

Cii 


\ <i <K,\ <3 <H. 


where 









58 


T. W. ANDERSON AND HERMAN RUBIN 


Then the element in the ith row and jth column of the symmetric matrix 
Mxg^Izz Mgx is 

K 

^ ekifk] . 

*-i 

If we wish to estimate several equations in the system by this method, this 
step need only be done once, as MxtMTzMgx and Wxx do not depend upon the 
equation (except that x would be enlai^ed). 

7.2. Computation of Pxu • We shall compute Pxu by the abbreviated Doolittle 
method. Let us now denote the element in the ith row and jth column of 
Muu by Uij , of Mux by bn . Then let us perform the previous operations, not 
including the last step. We may arrange the work, if only one equation is to 
be estimated, so that this is already done. Then define 

9ii = /.J - £ duOki, I < i < F,l < j < H. 

Then the element in the tth row and jth column of Pxu is ga . 

7.3. Computation of PxtMs»PxH . We know that 

(7.3) PxM,,P'z, = M„M7zM„ - . 

Let us compute Px>M„Pz,, using (7.3). We must first calculate Mx^Ml^Mux . 
We may do this either by the method of section 7.1, or as . 

7.4. Computation of p, p, and y. We shall use 

(5.3) y = - ffPxu 
to compute y after has/S been computed. 

Case 1) = 1. In this case the vector ^ = (1), r = PxM„Px,/Wxx • 

Case 2) H = 2, D > \. Let a,-,- denote the element in the fth row and jth 
column of PxtM„Px», Wi, the element in the ith row and jth column of TF„. 
Define 

h = I Px,M„Px, I , 

/Cl = \ Wxx \ 

h = KOu«’ 22 + 022^11 — 2ai2Wl2). 

Then 

ki — "s/ki — A'o ki 

" • 

Let 0 = PxtM„Pxt — pWxx . Then 

= 1, 

^2 _ _ ^ 

6i2 022 



ESTIMATION OF PARAMETERS 


59 


Case 3) // = 2, /) = 1. In this case j/ = 0. Then 6 = PxsMssPL , and ^ 
may be computed as before. 

Case 4)//>2, — 1. Using the procedure of section 7.2, compute 

A = (PxsMssPYxs^^Wxx . Lei us multiply equation (5.22) by — - {PxsMs,Px6 T\ 

V 

and set \/v = We obtain 

(7.4) {A - X/)^' = 0, 

where X is the largest characteristic root of A, Then we may employ the 
method of Aitken [1] to estimate X and jS. Let be an approximation to 
The column of A with largest absolute values is generally a satisfactory 
approximation. Define 

Qi = AqLi , 



The quantities Xtj approach X as i increases, and the normalized vectors (/» 
approach The convergence may be acc.eleratcd by the methods given by 
Aitken. The normalization should not be carried out until the X,; are sufficiently 
close for different J. 

Case 5) // > 2, D = // — 1. Ix't us go through the procedure of section 7.2 
with A = PxsMssPzb , and with no matrix B, Then Chh = 0. Set gu = I, 
and compute 

~ — 23 fhkQk y 
i<k<n 

Then 

.= 0 . 

Qi 

7.5. Computation of We have 
(7.5) = ^xz^' = (1 + vWxz^'- 

If we use the instead of the m% we must divide by and if other factors 
of proportionality arc used, we must divide by them, is in general biased, 
but the bias depends upon the nature of the complete system, and is not easy to 
calculate. The bias is of the order of l/T, 

8 . Confidence regions based on small sample theory/ If all of the pre¬ 
determined variables in the system are exogenous (i.e., ‘ffixed’O, we can obtain 
confidence regions for the coefficients of one equation on the basis of small sample 
theory. To do this we require only that the disturbance of the selected equation 
be normally distributed; that is, the linear form in the obser\"ations ffx't + yu't 

® Wc are indebted to Professor A. Wald for assistance in simplifying our approach to this 
problem. 



60 


T. W. ANDERSON AND HERMAN RUBIN 


is normally distributed with mean zero and variance o*^. The regression of this 
on fixed variates is normally distributed and certain quadratic forms in these 
linear forms have x*-distributions. On the basis of this we can set up confidence 
regions for the coefficients. 

In addition to assumptions A and B we use the following: 

Assumption E. All of the coordinates of zi = {ut Vt) arc exogenous. The 
moment matrix Mgg is non-singular. The disturbances of the selected equation are 
distributed independently and normally with mean 0 and variance a. 

Suppose we have a set of observations (xi, ui , vi), * • • {xr , Ut , Vt). If 
we know and y we can obtain T values of 

(8.1) Wt ^ px't + yuty i = 1, • • • , T. 

The sample regression coefficients of Wt on Ut and are 

(8.1) c = i 2 wtUtMZi = + y, 

(8.3) e = i i: = /8M..M7.*. 

1 c-1 

The two vectors c and e are distributed independently and normally with mean 0 
and covariance matrices 

(8.4) &{c'c) = , 

(8.5) S(e'e) = , 

Hence (by usual regression theory) 

(8.6) C = \ cMuuc' = + m^y' + yMux^ + yM^^y], 

< 7 ^ < 7 ^ 

(8.7) E = \eM,.e'= 

< 7 ^ ( 7 ^ 

(8.8) ^ (f S w\-C-E^^ i 

are distributed independently as x* with F, D, and T — K degrees of freedom, 
respectively. The ratio of any two has an ^’-distribution. 

On the basis of these considerations we can obtain the desired confidence 
regions. 

Theorem 3. Suppose assumptions A, B, and E are true. If the normalization 
is 


(8.9) 



ESTIMATION OF PARAMETERS 


61 


where 4>„ is a given matrix, (a) a confidence region for of confidence e consists of 
all satisfying (8.9) and 


( 8 . 10 ) 


0 *W„0*' 


T - K 
D 


< F D.r-K,(<), 


where Fo.r-Kit) is chosen so the probability of (8.10) for ff* = 0 is «. (b). A 
confidence region for /3 and y simultaneously consists of all /3* and y* satisfying 
(8.9) and 


( 8 . 11 ) 




T - K 
K 




(c) If the normalization is — I, then a confidence region for /S of confidence 
ei «2 consists of all 0* satisfying 

( 8 . 12 ) < xlM, 

(8.13) X^-A («2) < < x'^r-K (ef), 

where xx)(<i) is chosen so that the probability of (8.12) is «i when 0* — 0 and x^T-sitt) 
and x^T-xiti) are chosen so that the probability of (8.13) is ft when 0* — 0 and 


(8.14) 


x\et) < 1 < x\f.). 


(d) A confidence region for 0 and y simvllaneously consists of all 0* and y* satisfying 
(8.13) and 

(8.15) 0*M^MZiMu.0*' + 0*M^y*' + y*Mur0*' + y*M,,„y*' 

+ < xi («i). 


Region (c) is the interior of an ellipsoid and an ellipsoidal shell in the /3*-space; 
region (d) is similar in the 0 *, 7 *-space. Region (a) consists of the intersection 
of the quadric surface (8.9) and the interior of a cone in the |8*-space; region (b) 
is similar in the 0 *, 7 *-space. 

It is clear that there are many other ways of constructing confidence regions 
by taking regression on other fixed variates. Of these the best seem to be those 
of theorem 3. It has been proved [2] that the regions of theorem 3 are consistent 
in the sense that for sufficiently large T the probability is arbitrarily near 1 that 
all of the confidence region is within a certain distance of 0 or 0 , y. For an 
application of this technique to economic data see a paper by Bartlett [3] who 
suggested this method independently. 


9. An approximate small sample test of restrictions. When 0 * = 0, the 
probability of (8.10) is «. If 0 * is replaced by ^ which minimizes the expression 



62 


T. W. ANDERSON AND HERMAN RUBIN 


on the left, the probability is at least as great; it is, say, 1—5. This ratio is X, 
the smallest root of 


(9.1) 


D 


— X 


T - K 


Wx 


= 0 , 


Since 


(9.2) 


X = 


T - K 
TD 


where v is the smallest root of (4.14), the probability of 

TD 

(9.3) . > F^,r-K(e) 


is 5 < (1 — €). We summarize this as follows: 

Theorem 4. Under assumptions A, B, and E, the inequality (9.3), where v is 
the smallest root of (4.14), constitutes a test of the hypothesis that the coefficients of 
Vt in the selected structural equation are zero of significance less than 1 — e. 

This test is simply an approximation to the test given in section 6. The 
exact probability, 5, of (9.3) is unknoAvn; in fact the distribution of v depends on 
Uxv and the distribution of St . However, since S lies between 0 and 1 — €, we 
know that if the test is used as though the level were 1 — €, the test will be 
“conservative.^^ 

Another approximate test of the restrictions can be obtained from the in¬ 
equality (8.11). If the hypothesis is rejected on the basis of one of these tests, 
the corresponding confidence region (for 0 or for I3 and y) is imaginary, for all 
j8 or and y are excluded. It should be noticed that the use of a given ratio 
to test the hypothesis at significance level 5(<1 — c) does not affect the con¬ 
fidence coefficient € of the confidence region when the hypothesis is true. 


REFERENCES 

[1] A. C. Aitken, “Studies in practical mathematics II. The evaluation of the latent 

roots and latent vectors of a matrix,” Edinh. Math. Soc. Proc.j Vol. 57 (1936-7), 
pp. 269-305. 

[2] T. W. Anderson and Herman Rubin, “The asymptotic properties of estimates of the 

parameters of a single equation in a complete system of stochastic equations,“ 
to be published. 

[3] M. S. Bartlett, “A note on the statistical estimation of demand and supply relations 

from time series,” Econometricay Vol. 16 (1948), pp. 323-329. 

[4] P. 8. Dwyer, “Evaluation of linear forms,” Psychametrika, Vol. 6 (1941), pp. 355-365. 

[5] R. A. Fisher, “The statistical utilization of multiple measurements,” Annals of 

Eugenicsj Vol. 8 (1938), pp. 376-386. 

[6] M. A. Girshick and T. Haavelmo, “Statistical analysis of the demand for food: 

examples of simultaneous estimation of structural equations,“ Econometricay 
Vol. 15 (1947), pp. 79-110. 

[7] T. Haavelmo, “Statistical implications of a system of simultaneous equations,” 

Econometricay Vol. 11 (1943), pp. 1-12. 



ESTIMATION OP PARAMETERS 


63 


[8J P. L. Hsu, “On the problem of rank and the limiting distribution of Fisher^s test 
function,** Annals of Eugenics, Vol. 11 (1941), pp. 39-41. 

[9] H. B. Mann and A. Wald, “On the statistical treatment of linear stochastic difference 
equations,** Econometrica, V'ol. 11 (1943), pp. 173-220. 

;i0] Olav Reiers0l, “Confluence analysis by means of lag moments and other methods of 
confluence analysis,** Econornetrica, Vol. 9 (1941), pp. 1-24. 

[11] Statistical Inference in Dynamic Economic Systems, to be published as Cowles Com¬ 
mission Monograph No. 10. 



SOME SIGNIFICANCE TESTS FOR THE MEDIAN WHICH ARE 
VALID UNDER VERY GENERAL CONDITIONS^ 

By John E. Walsh 
The Rand Corporation 

1. Stuiunary. Order statistics are used to derive significance tests for the 
population median which are valid under very general conditions. These tests 
are approximately as powerful as the Student (-test for small samples from a 
normal population. Also the application of a test requires very little computa¬ 
tion. Thus the tests derived compare very favorably with the (-test for small 
sets of observations. Applications of these order statistic tests to certain well 
kno\vn statistical problems are given in another paper [1]. 

PART I. RESULTS AND DEFINITIONS 

2. Introduction. Consider n independent observations drawn from n popu¬ 
lations satisfying the conditions (A): 

1 ) Each population is continuous (i.e. its cdf is continuous). 

2) Each population is symmetrical. 

3) The median of each population has the same value (K the 50% point 
of a continuous symmetrical population is not unique, the median 4> of the popu¬ 
lation is defined to be the midpoint of the segment of 50% values.) 

It is to be emphasized that no two of the observations are necessarily draAvn 
from the same population. Significance tests are derived to compare ^ with a 
given constant value ^. 

A general method of obtaining one-sided and symmetrical tests is given in sec¬ 
tion 8. This general method furnishes tests which have significance levels of the 
form r/2", (r = 1, • • • , 2" — 1). Each value of r can be attained for some one¬ 
sided test. Unfortimately tests obtained by the general method are very difficult 
to apply from a computational viewpoint. If n > 10, the number of computa¬ 
tions required for the application of a test is prohibitive. 

To overcome the computational difficulty involved in using the general method, 
easily applied tests using order statistics are derived. These tests are based on 
order statistics of certain combinations of order statistics of the n observations, 
each combination being either a single order statistic of the n observations or 
one-half the sum of two order statistics. The tests are invariant under permuta¬ 
tion of the n observations and have significance levels of the form r/2", 
(r = 1, • • • , 2" — 1). Table 1 contains a list of some one-sided and symmetrical 
tests for n < 15 (* 1 , • • • , a:„ represent the n observations arranged in increasing 
order of magnitude). Additional significance tests can be obtained by use of 
Theorem 4 of section 6. 

* The results presented in this paper were obtained in the course of research conducted 
under the sponsorship of the Office of Naval Research. This research was performed while 
the author was at Princeton University. 


64 



SIGNIFICANCK TESTS FOR THE MEDIAN 


()0 


If a symmetrical population has a mean, the mean has the same value as the 
median. Thus if each population from which an observation is drawn satisfies 
the additional condition that its mean exists, the median tests derived in this 
paper are also tests of the mean. 

Although it is unlikely that conditions (A) are ever exactly satisfied in prac¬ 
tice, these conditions appear to be approximately satisfied in many practical 
situations. Moreover conditions (A) are of such a simple form that approximate 
verification can frequently be obtained without an extensive investigation. 

Certain of the order statistic tests arc very efficient if the n observations are a 
sample from a normal population. Efficiencies are listed for some of the tests in 
Table 1. These tests are approximately as efficient as the Student ^test. (The 
efficiency of a test, more precisely the power efficiency, is defined in section 3.) 

The order statistic tests are competitive with the Student ^test. In choosing 
between the two types of tests the following considerations may be of interest: 

(a) The order statistic; tests are valid under much more general conditions than 
the ^-test. 

(b) The order statistic tests are almost as efficient as the ^-test for small sam¬ 
ples from a normal population. 

(c) The order statistic tests are more easily computed than the i-test. 

(d) For the case of a sample from a normal population and near significance 
the ^test gives more information than the order statistic tests. 

In some cases a set of n independent observations satisfying only 1) and 3) of 
conditions (A) can be transformed into observations approximately satisfying all 
of conditions (A) by an appropriate (continuous monotonic change of variable. 
For example, replacing each observation by the logarithm of the value of the 
observation sometimes results in a set (jf observations having approximately 
symmetrical distributions. Since the transformation, say g{x), is continuous 
and monotonic, the resulting observations will have median g{<t>) if the original 
observations have median <t>. Confidence intervals can be found for <l> by first 
obtaining confidence intervals for g(<t>) on the basis of conditions (A) and then 
inverting. Significance tests can be obtained from these confidence intervals. 

The tests of Part I can be applied to furnish generalized solutions for several 
well knovm statistical problems. Some of these applications are given in another 
paper [1]. 

One application occurs in cases where there is reason to believe that condi¬ 
tions (A) arc satisfied but there is no reason to assume that the populations from 
which the observations were draMTi are even approximately the same. Perhaps 
the most common situation of this type is that in which the value of a certain 
quantity is experimentally determined by several different methods, all of which 
should theoretically yield the same result. Then there is no reason to believe 
that all the experimental values have the same precision. It may be permissible, 
however, to assume that each value is an observation from a continuous sym¬ 
metrical population and that all the populations have the same median. Then 
the order statistic tests can be used to test the true value of the quantity investi¬ 
gated. For example, consider the determination of a specified physical constant. 



66 


JOHN E. WALSH 


QO lO 
a a a 


lO 00 00 ^ 
Gi Oi Oi Ci 


CO 00 00 lO 

Ci Gi a a a 


rH CO »C Oi 00 
0 0^0^0)0 


AAA 




0 0 "o' 

£ £ £ 


V V V 




4444 

V V V V 

o* o» o» o> X I 

H H H H V 

+ + 4" + 

j3 

in ^ 

^^-5-3 + 

HlN HlM H|N Hie* ' 


y. y, X 

o 3 rt 03 

S S S 


<0 OO 00 

^ ^ Hie* 

y y y y 

ci d c3 

sees 


O lO CO CO 00 


<N CO O (N 00 


O ^ CO rH 


lO CO CO 00 
10 CSI r-H O 


CO I'- (N 00 ^ 


rH (N O O 

»0 (N rH O O 


(N i-H O O 


^ 


00 








SIGNIFICANCE TESTS FOR THE MEDIAN 


67 






CO o o 
00 O) Cd o 


4 -s ■# -s 

A A A A 

tx «D kf9 kO 

H H H H 
+ + + + 
nS ^ vii 

iHlM »-(lN i-(l« 1-ilN 


a a a c 
S S 6 S 


00 


o 


4 

44^4 
A A'^A 

!?§+*? 

+ +S + 

rHlM ^ 

^ H 




c a 
S 6 


H .S 
6 
c 
S 


>o 


CO 

05 


4 

A 4 
A 

H _ 

H 


H .S 


4 

4 A 

5 + 

+ v!? 
^ '-'ici 

s!S - 

,i+ 

-S H 

S 

d 

s 


»o 

05 


-^4 

A A 


+ 


+ 


+ + 

sS JE 

He* He* 

a e 
S S 


4 

A -§ 

s!l. + 

I i-llW ^ 

• '“sii 

^•l 

^ G 

.s 

s 


uo 


■^4 
A A 

H U 

+ + 

M N 

sit xit 

f*\C* .-♦If* 

i-i o 

H H 

+ + 

^|N HN 

a c 

a a 


4 

4 A 

A 

+ 


+ -E 

^ iHlN 

vit - 

j£ + 

1^ 

d 

a 


(N 

O 


^4 

A A 


S o 
^ H 

+ + 


HIM rH|M 1 -|!M 


4 

^4 


A 

+ 


W H 

+ + 

S—e' N.^ 

H|M rH|M 

fl a 

a a 


+ 

iHlM 


H 

H .a 

He* a 

e 

a 


4 44 4 

V V V V 

o 

H H H H 

+ + + + 

•<«■ 10 X 0*0 

v!t sit sit sit 

HN HlM rilM |H|CS 


Ji ii Ji ii 

X y. X X 
oS cS c3 oS 

a a a a 


t-H r-H 

^ lO CN ^ 


44 

V V 

H « 

+ + 

HM fh|M 


Ji uil 

d d 

a a 


4 

V4 

t 5 

:! + 

tSi 

^sit 

+ i 

5 S 

a 

X 

c3 

a 


"d V 

H ^ 

+ 5 

O' -|- 

r^lM 

lO 


+ 1 
A ^ 


4 

3 + 

« e. 

+ 'S. 

i-llM 

XO 

^7' e* 
He* - 

i+ 

UA 


a a 


a 


4 4 

V V 

M M 

H H 
+ + 
A A 

r-l|M HIM 

€0 eo 

H H 
+ + 
H H 


X X 
d d 

s a 


4 

H r-, 
+ 1 
s5 + 

H|M 

^ H 

/--s 
*5 HIM 

H - 


V V 

eo eo 

^ H 

+ + 

to « 

vS-sS- 

HiM H|M 


4 

4Z 

V 'a 

^ « 

+ -A 


4 4-^ 

V V V 

H^-§- 


H ^ ^ 


V 


^ H ^ 


H H 

+ + 

sf k0 

AA 

H|M HIM 

X X 
d d 

a a 


i + 

d ^ 

a £ 

X 

d 

a 


*S'v5-S 4- 

H|M H|M H|M * 

/'-S s 

lO lO lO 
H <H Hleo 

H H ' 

+++i 

HIM H|M H|M 2 

XXX 
d d d 

a a a 


CO ^ T-H 

05 iri oi 1-1 


Tl^ 00 o 
05 ^ oi ^ 


In. o o 
05 ^ oi ^ 


Tft o o 

05 oi 1-J 


TfH tN. o o 

05 rJH oi 


CO ^ 

lo ci 


ic 

d 


00 00 


»o 


In. O IC 

oi r4 d 


CO o »c 

(N d 


CO o «o 


CO o 




Tfl 


^ i-H O 


(N 


lO 





68 


JOHN E. WALSH 


Various scientists obtained experimental values for this constant by several differ¬ 
ent methods. If it can be assumed that each value is an observation from a 
continuous symmetrical population and that all the populations have the same 
median, the true value of the physical constant can be tested by applying the 
order statistic tests to the totality of the experimental values. 

3. Power eflSciency of tests. A problem which arises throughout the paper 
is that of determining how much information is lost by using some other test in 
place of the most powerful test of a given hypothesis. The quantitative measure 
of the amount of available information which is used by a test will be given as a 
percentage and is called the power efficiency of the test considered. 

In all cases investigated the underlying population is normal with unknown 
variance and the hypotheses tested concern the population median (mean). 
Then the most powerful test (one-sided or symmetrical) is the appropriate 
Student i-test. 

The procedure used to measure the power efficiency of a test is different from 
the common method of measuring the efficiency of an estimate. The efficiency 
of an estimate is obtained by taking the ratio of the variance of an efficient esti¬ 
mate with respect to the variance of the given estimate (expressed as a per¬ 
centage). The method of determining the power efficiency of a test, however, 
consists in continuously varying the sample size of the appropriate most powerful 
test (same significance level) until the power functions of the given test and the 
most powerful test are equivalent in the following sense: The area between the 
two power curves for which the power function of the most powerful test exceeds 
the power function of the given test is equal to the analogous area for which the 
power function of the most powerful test is less than that of the given test. (It 
is assumed that the power functions of the tests can be made to depend on the 
values of a single parameter.) The sam'ple size {not necessarily integral) of the 
most powerful test with equivalent power function divided by the sample size of the 
given test is called the power eficiency of the given test {expressed as a percentage). 

In obtaining power efficiencies in the manner defined above, the sample size 
of the most powerful test is allowed to assume non-integral values. This fur¬ 
nishes an interpolated measure of the same size of the most powerful test which is 
power function equivalent to the given test. As pointed out above, the ^-test 
is a most powerful test for the situations considered in this paper. A method of 
computing power function values for ^-tests having non-integral sample sizes is 
given below. 

The definition of power efficiency selected is very convenient from a computa¬ 
tional point of view. Power function values for .the i-test can be easily computed 
through use of the normal approximation given in [2]. For the significance levels 
considered in this paper, the normal approximation is reasonably accurate if 
the sample size is not too small. In the remaining cases the approximation 
underestimates some power function values and overestimates others. For the 
situations investigated, however, the error introduced by this combination of 



TABLE 2 


Efficiencies and power function values for certain order statistic tests 


Significance Test 

Sample 

Size 

Approx. 

Signif¬ 

icance 

Values of Power 
Function 


ciency 

Level 

5 = .6 

3 = 1.2 

i = 1.8 

t 

4.9 

% 

.0625 

.337 

.755 

.964 

i(,X4 4- Xs) < <l>0 

5 

98 

.0625 

.343 

.755 

.958 

t 

5.82 


.0469 

.327 

.779 

.980 

max[x6, |(x4 + x«)] < <t>o 

G 

97 

.04G9 

.334 

.779 

.972 

t 

5.88 


.0312 

.244 

.682 

.951 

^(X5 + xe) < <t>o 

G 

98 

.0312 

.254 

.687 

.942 

t 

G.65 


.0547 

.406 

.869 

.994 

max[x5 , i(.X4 + x?)] < </>o 

7 

95 

.0547 

.413 

.867 

.991 

t 

6.85 


.02.34 

.239 

.716 

.969 

max[x6, + Xt)] < <^o 

7 

98 

.0234 

.249 

.717 

.962 

t 

7.55 


.0430 

.395 

.882 

.996 

max[x6 ,h{x4 -{■ Xs)] < <t>a 

8 

94.5 

.0430 

.404 

.879 

.993 

t 

7.85 


.0117 

.174 

.650 

.956 

max[x7, + a-s)] < <t>o 

8 

98 

.0117 

.185 

.656 

.949 

t 

8.64 


.0215 

.302 

.839 

.994 

max[x7, + X9)] < <t>o 

9 

96 

.0215 

.311 

.834 

.990 

t 

8.9 


.0059 

.127 

.597 

.947 

max[xs, Ka;? + X9)] < <t>a 

9 

99 

.0059 

.137 

.599 

.935 

t 

7.5 


.0547 

.450 

.910 

.998 

Xs < </>0 

10 

75 

.0547 

.454 

.901 

.995 

t 

9.65 


.0107 

.227 

.790 

.991 

max[a:8 , h{xs + .Tio)] < 0o 

10 

96.5 

.0107 

.237 

.786 

.986 

t 

8.2 

1 

.0098 

.176 

.668 

.964 

max [0:9, + ^10)] < 00 

10 

82 

.0098 

.191 

.677 

.9.52 

t 

8.9 


.0059 

.141 

.621 

.954 

X\Q < 00 

11 

81 

.0059 

.152 

.634 

.942 

t 

11.22 


.0102 

.277 

.870 

.998 

max[x9 , §(x6 + X12)] < 00 

12 

93.5 

.0102 

.288 

.862 

1 

.995 


69 



70 


JOHN E. WALSH 


underestimation and overestimation tends to cancel out in the determination of 
power efficiencies if the above area definition of equality of power functions is 
used. Thus application of the normal approximation yields reasonably ac¬ 
curate power efficiencies for the cases considered in this paper. Use of the 
normal approximation furnishes an easily applied method of obtaining power 
function values for i-tests having non-integral sample sizes. 

Table 2 contains examples of the above described method of determining power 
efficiencies. Here the power function values for the ^test were computed using 
the normal approximation. Examination of Table 2 shows that the maximum 
difference between corresponding power function values for the two types of 
tests is small for all the cases considered there. This holds in the determination 
of all the power efficiencies listed in Table 1. 

Investigation indicates that the definition of power efficiency given here is for 
all practical purposes the same as that given in [3]. 

For the situations considered in this paper, it is sufficient to restrict power 
efficiency investigations to one-sided tests. Every symmetric test investigated 
can be considered as a combination of two non-overlapping one-sided tests, 
each having a significance level equal to half that of the symmetric test. Also, 
from symmetry, these one-sided tests (each considered as a separate test) have 
the same power efficiency. Thus it is an immediate consequence of the definition 
of power efficiency that the symmetric test has the same efficiency as each of the 
corresponding one-sided tests at half the significance level. 

PART II. DERIVATIONS 

4. Introduction. The purpose of the remainder of the paper is to present 
derivations of the significance test results stated in sections 1 and 2. The first 
derivations consist in obtaining confidence intervals for <f> on the basis of condi¬ 
tions (A). Then properties of these confidence intervals are analyzed. Applica¬ 
tion of the confidence intervals and their properties to significance tests furnishes 
many of the results stated in sections I and 2. The remaining derivations are 
concerned with efficiencies and the general method mentioned in section 2. 

6. Derivation of confidence intervals. Let us consider n independent ob¬ 
servations, each observation being drawn from a possibly different population. 
Denote these observations by , • • • , ?/n and let the cdf of yi be given by Fi , 
(i == 1, • • • , n). Furthermore let the n populations from which these n ob¬ 
servations were dra^vn satisfy conditions (A). Then 1) of conditions (A) re¬ 
quires that each F, is continuous, while 2) and 3) stipulate that 

[ dFriyi - 0) = [ dFi(yi - 0), (i = 1, --^n), 

J—00 j c 

for all values of c in the interval — <» < c < «. 

Let Xi, • • • , Xn represent 2 / 1 , • • • , i/n arranged in increasing order of magni¬ 
tude. Since the cdf’s are continuous, Pr(xi = x,-; i 9 ^ j) — 0. For the situa- 



SIGNIFICANCE TESTS FOR THE MEDIAN 


71 


tions treated in this paper, it is sufficient to consider one-sided confidence inter¬ 
vals for 0. All one-sided confidence intervals derived have one of the forms 

I ‘ > ^n) ^ 0, 

Hxi , • • • , ajn) > 0, 

where g and h are Borel measurable functions of xi, • • • ,Xn such that 

Pr[g{xi , * • • , :rn) < 0] = Pr[g(xi - 0, • • • , Xn ~ 0) <0], 

Pr[h{xi , * • • , :rn) > 0] = Pr[h(xi - 0, • • • , Xn - 0) > 0]. 

Consider the additional condition 
(B) All populations are the same. 

In terms of cumulative distribution functions, condition (B) requires that all 
the cdf’s Fi are equal to some cdf F. A theorem will be proved which shows that 
all confidence intervals of the forms (1) derived on the basis of both conditions 
(A) and (B) are also valid if only conditions (A) necessarily hold; i.e. if 

PT[g(^Xi > * * * ) Xn^ ^ 0] = p 

whenever Xi, • • • , .tn are order statistics of observations from populations satis¬ 
fying conditions (A) and (B), then this probability expression also has the value 
p if xi, • • • , Xn are from populations necessarily siitisfying only conditions (A). 
Similarly for Pr[h{xi , • • • , Xn) >0]. 

Theorem 1. Let Q{xi ~ 0, • • • , a^n — 0) a probability statement involving 
Xi — 0, • • • , Xn — 0, which defines a Borel measurable region /?(xi — 0, • • • , Xn —0) 
oj the n-dimensional order statistic space. If 

(2) Q(xi - 0, • • • , Xn - 0) = p 

whenever xi, • • • ,Xn are order statistics of n independent observations from popula- 
lions satisfying conditions (A) and (B), then (2) also holds when Xi, • • • ^ Xn are 
order statistics of n independent observations from populations necessarily satis¬ 
fying only conditions (A). 

Proof. It is sufficient to consider the case in which 0 = 0. Then, if condi¬ 
tions (A) are satisfied, the joint probability element of Xi, • • • , Xn is 

dF(xi, • • •, Xn) == 21 dFiix^a)) • * * dFnixr^), 

where the summation is taken over all permutations jt of the integers I, ,n, 
and F’s are cdf’s of symmetrical populations with zero median. Let R — 
R{xi , • • • , ®„) be the region of the n-dimensional order statistic space defined 
by the probability statement Qixi, ••• , Xn)- Then Theorem 1 stipulates that 

f dF(xi, ■'‘,x„) = p 

R 


( 3 ) 



72 


JOHN E. WALSH 


whenever j/i ,•••,«/« are from populations satisfying conditions (A) and (B) 
with zero median. In this case, however, each Fi = F and (3) becomes 

(4) nl f f[ dF(x,) = p, 

Jr 

where F is the cdf of a population satisfying conditions (A) and (B) with zero 
median. Let 

p = n (z 

and define Sa to be the sum of all terms in the expansion of P which contain a 
specified a of dFi , • • • . df , and no others; the particular set chosen is denoted 

by jS, where 0 = 1, • • ■ , C)- Th'” 

P = F[Xi , * • • , Xn) 4" ^n-1 + * • ‘ • 

Now consider any given Sa (i.e. «, /3 given). Define dH to be the sum of the 
a of dFi , • • * , dFn pertaining to /3 plus any set of zero or more of the remaining 
dF^s. Then no matter which of the remaining dF^s are chosen for dH, the sum 

n 

of those terms in the expansion of 11 d//(a;i) which contain the particular set of 

1-1 

a of dFi, • • • , dFn is always equal to . Let 

Pa = Z (n dGiixdj , 

^^here dG^ equals the sum of the a of dFi, • • • , dFn pertaining to 0, Then from 
the above and the symmetrical fashion in which the dF^s are treated, 

Fa - Z Si + Z SLi + • • • + z >S?, 

0 fi fi 

where the (w = 1, • • • , a — 1), are constants. 

Consider the case in which a = n — 1. Using the above expression for Pa , 

F = dF{xi , • • • , Xn) + Fn_l 

+ (1 — Z si-2 + • • • + (1 — Ki^ Z Si . 

fi 

Repeating this procedure successively for a = n — 2, n — 3, • • • ,1 shows that 

dF(Xi , • • • , Xn) = F + Cn-lPn-l + • • • + CiPi , 

where the Cv, (v = 1, • • • , n — 1), are constants. 

Since each F»- is the cdf of a symmetrical population with zero median, 

Gi/a = i (sum of the a of Fi, • • • , Fn pertaining to P) 



SIGNIFICANCE TESTS FOR THE MEDIAN 


73 


is also the cdf for a continuous symmetrical population with zero median. But 
= «" ^ (n dGi{x,)/a^ . 

Hence dF(xi , • • • , Xn) is equal to a sum of terms (multiplied by certain con¬ 
stants) of the form 

n 

n! n 

i-1 

where F is the cdf of a continuous symmetrical population with zero median. 
Thus from (4) and the linear properties of the integral, 

dF{xi, ••• ,x„) = p 

if 2 / 1 ) • • • 1 2 /n are from populations necessarily satisfying only conditions (A). 
Q.e.d. 

Next confidence intervals of the forms (1) will be derived for <t> on the basis of 
conditions (A) and (B). Before stating the theorem on which these confidence 
intervals are based consider the following definition of notation: For each per¬ 
missible selection of i and j, the symbol 

{i,j] {l<i<j<n) 

denotes an arbitrary but fixed selection of one or both of the inequality signs 
<, >. The selection of both inequality signs, denoted by 5 > the interpre¬ 
tation 

CO < x, < 00 

{xi -f Xj)/2 ^ (^ = - CO < (x< + Xj)/2 < 00 . 

It is to be noted that {r, sj is not necessarily equal to {f, j\ unless r = i and 
8 = j. 

THEonp:M 2. Consider the probability statement 

(5) Fr[(x,- -}- x,)/2 {i,j] </>; 1 < i < i < w]. 

Let this statement have the value q if xi, • • • , Xn are order statistics of a sample of 
size n drawn from the uniform populalion with range — 5 <o | (then </> = 0). Then 
(5) also has the value q if Xi, • ■ • , x„ are order statistics of a sample size n drawn 
from any population satisfying conditions (A) and (B). 

Proof. Let 1 / 1 , • • • , i/n be a sample of n values from a population satisfying 
conditions (A) and (B) while x\, ■ ■ ■ ,Xn are the y’s arranged in increasing order 
of magnitude. Then there is a monotone function x (see [4]) such that ic{z) will 
have the same cdf asyt — <l>ii z is from a uniform population with range — ^ to 
Since the y’s are from a symmetrical population, —viz) = x(— 2 ). LetXi — = 

v(zi), (i = 1 , • • • , n), define the 2 <. Then 



74 


JOHN E. WALSH 


Pr[{xi + Xi)/2\i, j]4>] “ i’rKirCgi) + x( 2 ,)){t, i)0] 

= Pr[ir(2,-){i, j} - ir(z,)]. 

From the monotone and symmetrical properties of the function x, 

Pr[ir{zi)[i,j} - x( 2 ;)] = Pr[x(z,){», j}x(-z,)] 

= Pr[Zi{i,j} - z,]. 

By hypothesis this last expression has the value q, thus completing the proof. 

Many of the probability statements of the form (5) have zero probability. 
For example, Pr[a;i ></>, a ;2 <<>,•••] = 0. Also many selections of the symbols 
{i,j] result in equivalent probability statements. For example 

Pr{xi ^ <t>, Xi < <l>) = Pr(xi < <j>, xt < <t>). 

An immediate consequence of Theorem 2 is that one-sided confidence inter¬ 
vals can be obtained for <f> by choosing any specified subset of (xi -j- Xj)/2, 
(1 < i < j < n), and considering an arbitrary but fixed order statistic of the 
values of this subset. For example, consider the subset consisting of x„-i and 
(xn-i + Xn)/2. Then 


Pr{max[a:„_i, (x „-2 -t- x„)/2] < 0) = Pr[(x< -f Xj)/2{i,j}<l>], 

where 

I < if either i — j — n — 1; or t = ra — 2 , j =.n\ 
[ ^ otherwise. 


In general, the confidence coefficient of any one-sided confidence interval 
fgrmed by considering a certain order statistic of a specified subset of (xi d- a;,)/ 2 , 
(1 < t < j < n), can be expressed as a sum of probabilities of the form ( 5 ), 
where {i,j} = 5 if (^> + ^t)/^ is not included in the specified subset, (t < j). 

It is usually preferable to select the subset of (x< -t- x,)/2, {I < i < j < n), 
in such a way that no two of the elements chosen necessarily have an order 
relation. 

Satisfactory two-sided confidence intervals can usually be obtained as combina¬ 
tions of one-sided confidence intervals. 


6 . Confidence coefficients. The purpose of this section is to show that all 
the confidence coefficients for one-sided confidence intervals derived on the basis 
of Theorem 2 are of the form r/ 2 ", (r = 1 , • • • , 2 " — 1 ). Also a method of 
determining confidence coefficient values for one-sided confidence intervals is 
developed. 

First a theorem will be presented which shows that each of the one-sided con¬ 
fidence intervals derived in the preceding section has a confidence coefficient of 
the form r/ 2 ", (r = 1 , • • • , 2 "— 1 ). On the basis of Theorem 2 it is sufficient 
to prove: 



SIGNIFICANCE TESTS FOR THE MEDIAN 


75 


Theorem 3. Let Xi, * * • , Xnbe the ordered values of a sample from the uniform 
population with range — ^ to Then 

Pr[{xi + xy)/2{i, j} 0; 1 < i < y < n] = r/2” 

where r has one of the values 0, 1, • • • , 2\ {The symbol {i, j] is defined in section 
5). 

Sketch of Proof. This theorem is proved by investigating how the hyper¬ 
planes 

^{xi + xf) 0 (1 < i < i < n), 

intersect the n-dimensional order statistic space for the particular population 
considered. It is found that each relation of the form 

+ Xj) {ij} 0, {I < i < j > n) 

defines a region of the n-dimensional order statistic space which consists of a 
certain number r of n-dimensional ‘^basic” cells each of which has an n-dimen- 
sional, ‘Volume’’ equal to A detailed proof of this theorem is given in 

(5) . 

Next a method will be developed whereby confidence coefficient values can 
be determined for any one-sided confidence interval of the form 

(6) h{xi + xi) [hj}<l>f (I < i < j < n). 

For this purpose it is sufficient to derive a procedure for determining the con¬ 
fidence coefficient of any confidence interval of the form 

(7) max [certain subset of ^{xi + xf); l<i<j<n]<(l>. 

The confidence coefficient of any one-sided confidence interval of the form 
min [ 1 > 0 can be obtained by symmetry. The confidence coefficient of any 
other one-sided confidence interval of the form (6) can be found by expressing 
the value of 


Pr + a-,) j\ <t>] 

as a sum of terms of the form Pr{max [ ] < <^} or as a sum of terms of the form 
Pr {min [ ] > <t>}. That this is always possible for one-sided confidence intervals 
of the form (6) is shown by direct application of the results of page 17 of [6]. 

It is not difficult to show that any one-sided confidence interval of the form 
(7) can be expressed in the form 

max [x{n — k), f -f- 1) -f x(n — nik — k + 1)], • • • , 

Ma(n) + x{n - wii)]} < <f>, 


x(i) = Xi, 


where 


(i = 1, • • • , n), 



76 


JOHN E. AVALSH 


and mi, • • • ,mk8xek integers such that 

n > mi> m%> • • • > mit > 0 . 

This is done by choosing fc, mi, • • • , m* so that the two confidence intervals are 
equivalent. 

Thus it is sufficient to prove the following theorem: 

Theorem 4 . Let • • • , x(ri) represent the ordered values of n independent 
observations drawn from populations satisfying conditions (A). Choose a set of k 
integers mi, • • • , m/t siLch that 

n > mi > m 2 > • • • > mi. > 0 . 

Then the one-sided confidence interval 

max {x(n — k), ^[x(n — fc + 1 ) + x{n — 7njfe — fc + 1 )], • • • , 

( 8 ) 

Ma:(«) + x(n - mi)]} < 0 , 

where a term of the form ^[x{n — A + 1) x(n — m* — A + 1)], (A = 1, • • • , A), 
is to be deleted i/* n — m* — A + 1 = 0, Aos the confidence coefiUdent 

[ m2 m3 m 2~~> 2 

1 + mi + £ (mi — ti) + 2 ("ii — ii — * 2 ) 

+ • • • + £ 2 • • • 2 (mi — — • • • — ijfc-i) . 

<*.. 1-1 n-i J 

Sketch of Proof. It is sufficient to consider the case in which the n observa¬ 
tions are a sample from the uniform population with range — | to § (then <l> = 0), 
Let us consider the region of the n-dimensional order statistic space defined by 
( 8 ). This region can be considered as an intersection of n-dimensional regions 
each of which is completely defined by a certain region in an a:/ , Xi plane 
(I < i < j < n). Also the n-dimensional ‘Volume*' of this region equals the 
value of the confidence coefficient of the confidence interval ( 8 ). 

By Theorem 3, the intersection region of ( 8 ) consists of a certain number of 
“basic" cells, each of n-dimensional “volume" (1)"^. Theorem 4 is proved by 
developing a method for finding the number of “basic" cells in this intersection 
region on the basis of the corresponding regions in the Xi , x,* planes. It is found 
that the intersection region consists of 

1 + mi + • • • -i- • • • 23 (7ni - ii - • • • - ik~i) 

H-i-i 

“basic" cells. A detailed derivation of this expression is given in [5]. 

Now consider some examples of the application of Theorem 4. Let n = 11 , 
mi = 11,7712 = 5, m 3 = 2 . Then, by Theorem 4, the one-sided confidence inter¬ 
val 

max [X8, ^xg + xt), |(xio + X 5 )] < </> 

* For the trivial case in which fc = n the value of (9) is unity. 






SIGNIFICANCE TOSTS FOR THE MEDIAN 


77 


has a confidence coefficient equal to 103/2^\ If n = 12 instead of 11, the con¬ 
fidence coefficient would be 103/2^^ while the confidence interval becomes 

max [X9 , K^io + + ^e), hixi 2 + xi)] < <t>. 

As another example, let n = 11 and consider the confidence interval. 

Max [xs , + xt), K^io + Xb), Uxii + Xi)] < 0. 

Here fc = 3 and comparison with ( 8 ) shows that this confidence interval satis¬ 
fied Theorem 4 with mi = 7, m 2 = 5, m 3 = 2 . Thus it has a confidence coeffi¬ 
cient equal to 51/2^\ 

Theorem 3 shows that each one-sided confidence interval developed on the 
basis of Theorem 2 has a confidence coefficient of the form r/2”, (0 < r < 2 ”). 
The question arises as to whether the one-sided confidence intervals defined by 
Theorem 4 have confidence coefficients which attain each of the values 1 / 2 '', 
2 / 2 ", • • • , ( 2 " — l)/ 2 ". That this is not the case is proved as follows: The 
totality of different confidence intervals of the form ( 8 ) is equal to 2 " — 1 . This 
is shown by counting how many ways the integers mi, • • • , m* can be selected 
subject to the conditions > mi > m 2 > • • • > mjt > 0. It is easily seen that 

there are possible ways. Summing over the possible values of k yields 

2" — 1. This figure is increased to 2" if the confidence interval Xn < 0 is also 
included. Examination of (9) shows, however, that two different selections 
of mi, m 2 , etc., will result in the same value of (9) for more than one case. 
Thus the one-sided confidence intervals of Theorem 4 do not have confidence 
coefficients which attain each of the values 1 / 2 ", • • • , ( 2 " — l)/ 2 ”. 

Although the class of one-sided confidence intervals defined by Theorem 4 do 
not have confidence coefficients which attain each of the values 1 / 2 ", 2 / 2 ", • • • , 
( 2 " — l)/ 2 ", they do have another property which is important from a practical 
point of view: If a certain confidence coefficient can be obtained for a particular 
value of n, then this confidence coefficient can also be obtained for all greater 
values of n. This result is a consequence of the following theorem: 

Theorem 5. Let a;(l), • • • , x{n) be the ordered values of n independent observa¬ 
tions drawn from populations satisfying conditions (A). Then if a confidence in¬ 
terval of the form ( 8 ) has the confidence coefficient e for a certain value tiq of n, it is 
always possible to obtain another confidence interval of the form ( 8 ), which has the 
confidence coefiicient efor the value rq + 1 . 

Proof. Let mi, • • • , be the integers corresponding to the given confidence 
interval of form ( 8 ). These integers satisfy the condition 

^ ^ > ^2 > • * • mjfc > 0 . 

Let no be replaced by 7io + 1 and consider the new set of integers (mi + 1), 
(mo -f 1), • • • , (m^. -t- 1), 1. Evidently 

no + 1 > mi -f 1 > • • • > m* + 1 > 1 > 0 . 



78 


JOHN £. WALSH 


Hence these integers can be used to define a confidence interval of the form (8). 
Also it is easily verified that 

1 + (nil + 1) + S (wii + 1 — h) 

<1-1 

wijb+1 mj+l—**-1 

+ • * • + 2 * * ' ^ (^1 + 1 — t’l — • • • — 4-l) 

* 1-1 

"^23 2 * * ’ S (^1 — 1 — t’l — • * • — 4) 

*•*-1 <*- 1-1 n-1 

[ m2 mjfc 

1 + Wti + 2 (mi - ii) + • • • + 2 • •' 

<1-1 <*-1-1 

-<*-l "1 

23 (mi - ti — • • • - t*_i) . 
< 1-1 J 

Thus the new confidence interval has the same confidence coefficient as the given 
confidence interval. 

From symmetry considerations, the one-sided confidence interval 

min {x(k + 1), i[x(/c) + x(mk -f *)], • • • , Ma:(l) + x{mi -f- 1)]) > </>, 

where a term of the form Mx(A) -f- x(mk •+• A)], {h = 1, • • • , k), is to be deleted 
a nik + h = n + \, has the same confidence coefficient as the one-sided con¬ 
fidence interval (8); i.e. its confidence coefficient is given by (9). 

7. Efficiency of some tests based on conditions (A). Let us consider the case 
in which the n observations used for a test are a sample from a normal population 
with unknown variance. The purpose of this section is to investigate the effi¬ 
ciency of some tests based on conditions (A) for this special case. 

The method used to obtain efficiencies is outlined in section 3. Only one-sided 
and symmetrical tests are considered. For this purpose it is sufficient to limit 
investigations to one-sided tests oi <i> < <^>o. 

If the subset of Kxi -f *,), (1 < i < j < n), chosen for a test is not of one of 
the forms 

(a) Xi 

(b) Ka:< + X,), (i<j); 

(c) Xj, ^(xi -h Xk), (i < j < k), 

the determination of power function values requires a numerical double or higher 
order integration. Such numerical integrations are extremely lengthy. For 
this reason only one-sided significance tests based on subsets of the forms (a) - (c) 
will be investigated. 

Let the normal population have variance o-* and consider one-sided tests of 
^ < 00 based on subsets of the form (a). Then 







SIGNIFICANCE TESTS FOR THE MEDIAN 


79 


Power Function = Pr (a;,- < <^o) 

.Pr[^< . t mmi - mr-, 

where 

s = (<t >0 - 4>)/<r, N(5) = dy. 

The power function values listed for the test x,* < 0o in Table 2 were computed 
from the above expression. The corresponding values for the ^-test were com¬ 
puted from the normal approximation given in [2]. 

For subsets of forms (b) and (c) the expression for the power function is more 
complicated and will not be either derived or stated here. For any particular 
case, however, a simple analysis will yield an expression for the power function 
which requires only a first order numerical integration. General expressions 
for the power functions when the subsets are of the forms (b) and (c) are stated 
and derived in [5]. 

Table 2 contains power function values and efficiencies for several tests based 
on subsets of the forms (b) and (c). The power function values were computed 
by approximate integration (Simpson’s rule, etc.). The ^test power function 
values were obtained by using the normal approximation. The power efficien¬ 
cies listed in Table 1 for tests which do not appear in Table 2 were computed in 
[5], where a table of powder function values is also given. 

Examination of Table 2 shows that many of the tests formed from subsets of 
types (b) and (c) are very efficient for small values of n. The efficiency appears 
to decrease as n increases. Also the efficiency of a test depends strongly on the 
subset of ^(Xi + Xj), {I < i < j < n), used to form the test. For example, 
let n = 10. The test 

Accept <t> < <t>Qif max [xg, ^{xi + Xio)] < (h 

has a significance level of approximately .01 but an efficiency of only 82%. 
However the test 

Accept <l) < <l>o if max[x 8 , + Xw)] < 00 

also has a significance level of approximately .01 but an efficiency of 96.5% 
An approximate set of rules for picking subsets which result in efficient tests 
of 0 < 00 is suggested by the results of Table 2. Let x(ii), • • • , x{ir) be the 
order statistics which make up the elements of the particular subset of ^{xi + X;), 
(1 i ^ i ^ ^0, to be used for the test. The approximate rules are 

1. Use the maximum of the values of the elements of the subset. 

2. Choose , * * • , ir so that max(ii, • • * , ir) = n and min( 2 i , • • * , ir) is as 
large as possible subject to the restriction that the test is to have a signifi¬ 
cance level of a specified order of magnitude. 

Symmetry considerations furnish the corresponding set of rules for obtaining 
efficient tests of 0 < 0o. 



80 


JOHN E. WALSH 


Other tests at approximately the same signiflcance levels but not based on sub¬ 
sets of the forms (a)-(c) are undoubtedly more eflBcient than many of the tests 
considered in Tables 1 and 2 (particularly for the larger values of n). Computa¬ 
tional difficulties, however, prevent consideration of more general situations. 

8. A general solution.^ A general method of obtaining one-sided tests of 

<^ < <f>o and <^ > <^o, also symmetrical tests of <#> 5^ , on the basis of conditions 

(A) is the following: 

Let Vi, • • • , ynhen independent observations drawn from populations satis¬ 
fying conditions (A). Let 

Zi = Vi - <h (f = 1, • • • , n). 

If the null hypothesis o(<f> = <^ is satisfied, each s,- is an observation from a popu¬ 
lation satisfying conditions (A) with zero median. Consider the 2" sets of values 
obtained by the transformations 

Zi -*■ tii)Zi , (i - I, , n). 

where «(i) is one of the signs -f- or —, Form the mean of each of the 2" sets of 
values. Then it is readily seen, from conditions (A), that the probability that 
z(= 2 Zi/n) is less than the (r -t- l)th largest of the 2" means has the value 
r/2" when the null hypothesis is true. Similarly the probability that 2 is greater 
than the (2” — r)th largest of the 2" means is equal to r/2’* if the null hypothesis 
of <6 = <^o is satisfied. Thus the test 

Accept <}> KipoifSis less than the (r l)th largest of the 2" means. 

is a one-sided test of 0 < <^ with significance level equal to r/2’‘. Likewise the 
■one-sided test 

Accept <l> > <t>o if z is greater than the (2" — r)th largest of the 2** means. 
has the agnificance level r/2". Consequently the symmetrical test 

Accept (p 7 ^ ipQ if z is either less than the (r -f l)th largest or greater 
than the (2" — r)th largest of the 2" means. 

has a significance level equal to 2r/2". 

The application of any of the above tests requires the computation of the 2" 
means and a determination of where z falls in the ordering of these means. If 
n = 5, only 32 means need be computed. If n = 10, however, 1024 means must 
be computed. Evidently this test is too cumbersome to apply except for very 
small values of n. 

9. Acknowledgements. The author would like to express his appreciation to 
Professors S. S. Wilks and John W. Tukey for valuable advice and assistance in 

’ This solution was derived independently by E. J. G. Pitman and the author. The fun¬ 
damental idea on which the solution is based was presented by R. A. Fisher in [7]. 



SIGNIFICANCE TESTS FOR THE MEDIAN 


81 


the preparation of this paper, also to Mrs. Ruth S. Shafer for computational as¬ 
sistance. 


REFERENCES 

[1] John E. Walsh, “Applications of some significance tests for the median which are valid 

under very general conditions,** submitted to Am. Slat. Assn. Jour. 

[2] N. L. Johnson and B. L. Welch, “Applications of the non-central ^-distribution,** 

Biometrika, Vol. 31 (1940), p. 376. 

[3] John E. Walsh, “On the power function of the sign test for slippage of means,** Annals 

of Math. Stat.y Vol. 17 (1946), pp. 360-361. 

[4] H. Scheff6 and J. W. Tukey, “Non-parametric estimation. I. Validation of order 

statistics,** Annals of Math. Stat., Vol. 16 (1945), pp. 187-192. 

[6] John E. Walsh, “Some significance tests for the median which are valid under very 
general conditions,** unpublished thesis, Princeton University. 

[6] G. Udny Yule and M. G. Kendall, An Introduction to the Theory of Statistics, Griffin 

and Co., 1947. 

[7] R. A. Fisher, The Design of Experiments, Oliver and Boyd, 1942. 



A DIRECT METHOD FOR PRODUCING RANDOM DIGITS IN ANY 

NUMBER SYSTEM 


By H. Burke Horton and R. Tynes Smith III 
Inter state Commerce Commission 

1. Summary. A compounding technique first used to produce random binary 
digits is generalized and extended to other number systems. Formulae for the 
rate of convergence of probabilities to the desired values are derived. The 
method is extended to the production of random digits with fixed but unequal 
probabilities. Numerical results are presented in summary form together with 
results of tests applied to a set of random digits produced by the method. 

2. Introduction. In a note [1] by one of the authors a method of producing 
random digits was presented. The method was based upon a process, designated 
“compound randomization,” used to produce random binary digits, which can be 
converted to random digits in other number systems by simple methods. De¬ 
spite the ease of converting a random binary series to another system, it is of 
interest to examine the problem of direct production of random digits in any 
number system. In the course of producing random binary digits with machine 
tabulating equipment, and while designing an electronic device to produce ran¬ 
dom binary digits, it was noted that the multiplication process described in the 
earlier paper was the equivalent of addition modulo 2 of a series of binary digits. 
This observation laid the basis for generalizing to other number systems.^ 

3. Initial conditions and notation. Let us assume that there is available a 
source of digits, 0, 1, 2, • • • (n — 1), in a number system of base n, where n is a 
positive integer, n > 1. Let pr» represent the probability of obtaining the rth 
digit in the sth trial. Assume that initial conditions can be controlled so that 
the trials are independent* and 

(3.1) Vr. § « 

where 0<t^l/Risa fixed positive number. (It may be noted at this point 
that conventional “single-stage” methods of producing random numbers are 
based upon the assumption that pr, = e = 1/n.) Let ir„ represent the prob¬ 
ability of obtaining the rth digit by addition modulo n of the digits obtained in 
s individual trials. In order to express w„ in terms of pr., consider two sets of 
matrices whose elements are defined as follows: 

* In acting as referee for [1] Dr. George W. Brown suggested generalizing to other number 
systems by addition modulo n. 

* J. E. Walsh [2] has considered, in terms of conditional probabilities, the effect of inter¬ 
correlation on compound randomization in the binary system. 

82 



PRODUCING RANDOM DIGITS 


83 




P0,M 

pn-~l,a Pn^2,t’ 

••Pi.. 



Pi.. 

PQ,a Pn—l,a' ’ ' 

‘ •P2.. 

(3.2) 

da = 

V2,i 

Pl,B Po.a * * ■ 

• -pa.* 




Pn-2,a Pn—3,** * * 

• -Po,. 




^n—1,« '^n—2,f* * 

* * 




^0,« TTn—1,* * 

• • • 7r2,i 

(3.3) 

da = 

‘n’2,a 

^1.* ^0,« * * 

• • * TTS.J 


II ^n—1,« ^n—3,a.^0,# |i 

Note that a, and «, are Markoff matrices with two additional restrictions: (1) 
there are no zero elements, and (2) colunm (as well as row) sums are unity. Each 
n X n matrix is made up of only n distinct elements, namely, the n different 
probabilities associated with the sth trial for a ,, or the n different probabilities 
associated with the sum of s trials for a,. 

4. Relation of Vr, to prs. A&suming independent trials, we have the following 
relationships: 

ai = ffli ; 

0(2 ~ ’ CKl “ ’ Ctl 5 

(4.1) aa = (Z 3 • 0(2 “ 5 


k 

(Xk ~ dk * Otk-~l ~ XI • 

f-1 

Thus, since any row (or any column) of ak is a permutation of the Wrk , by (4.1) 
the Trk are expressed in terms of the individual probabilities, . 

6. Convergence of Wrk to l/ii. (5.01) Theorem.® Lim^-* Wrk = 1/R. 

Proof. Let p, denote the range of the elements of o(,. Each element of 
Q(, is a weighted mean of the n distinct elements of . The n distinct elements 
of a, are used as weights in the averaging process. Now the range of a set of 
weighted means (weights > 0) of a set of values must be less than the range of 
the values themselves, unless both ranges are zero. Therefore, since the weights, 
Pra > 0 by condition (3.1), 

(5.02) Pa < pa^i , for pa-i 9 ^ 0, or in the special case p,_i = 0, p, = 0. 

n-l 

Also, since 2 = 1, 

* While this article was awaiting publication, J. WolfowHtz independently proved theorem 

( 5 . 01 ). 
















84 


H. BUKKP: HORTON AND R. TYNES SMITH III 


(5.03) 1/71 — p, ^ TTr* ^ l/7^ + ps . 

In order to show that lim«^oo ps = 0, and to derive formulae for the rate of con¬ 
vergence of Tn to the limiting value, 1/n, let Wi represent the ordered for any 
given s: Wi = the smallest Prs, • ’ • Wn = the largest of the prs • In a similar 
manner let Xi represent the ordered Tr.g-i . The following inequalities for the 
maximum and minimum Trs can be set down immediately: 

(5.04) max TPr, g Wn-Xn + Wn-VXn^l + * * * + Xl ^ 


(5.05) min 7r„ ^ Wn-Xi + Wn-. 1 -X 2 + • • • + Wi-Xn. 

r 


And since p« = max Xr# — min Xr., 

r r 

(5.06) p, ^ w„(x„ — xi) + u>„_i(x„_i — X 2 ) + • • • + wtixt — x„_i) + Wiixi — x,). 
For n even, let m = n/2 + 1, then by regrouping terms, 

P. ^ (Wn - Wi){Xn — Xi) + (w„_i - Wi)(x„-i - X2) + • • • 

(5.07) 


+ iw„ - Wm-i)iXm - X„_i). 


Noting that p,_i = (xn — xi) ^ (x»_i — xj) ^ ^ (xm — Xm-i), the following 

substitutions can be made: 


(5.08) P, ^ (m)« — 1Cl)p,-l + (tp«_l - Wt)p,-1 + • • • + (tPm - Wn_l)p.-1 . 


For compactness, this may be written, 


(5.09) 



Pa —1 • 


Similarly for n odd, let m = (n + l)/2; proceeding in the same maimer as above, 
the median term vanishes, yielding as a final result, 


( 5 . 10 ) 



Pa- 1 « 


For simplicity denote the expression in brackets by 5, ; then 
(5.11) Pa ^ 5a * Pa—1 f 


where for n even, 5, represents the sum of the largest n/2 of the pra minus the 
sum of the smallest n/2 of the pra , and for n odd 5, represents the sum of the 
largest (7^ — l)/2 of the pra minus the sum of the smallest (n — l)/2 of the pra . 
Continuing the process developed above, we find that 


Pa ^ 5, • • p,-2; 


(5.12) 

(5.13) 


Pa ^ 5, • • 5,-2.52 • Pi . 




PllODITCING KAMDOM DIGITS 


s:) 


Since ^ pi, the following simple inequality holds: 

(5.14) P*^ri3., 

8—1 

Now 6, ^ 1 — 7ic, by condition (3.1) and the definition of 5,. Therefore, 

k 

(5.15) lim pk < lim IT ^ (1 “ = 0> 

A:-* 00 /;-»oo a**l A:-*oo 

and (5.01) is proven. In the special case of constant probabilities from trial to 
trial, 5, = 5o, a constant,and (5.14) becomes 

(5.16) p, g {b,f. 

Since the mean «■„ is 1/n, we have the following useful inequalities: 

k k 

(5.17) 1/a - n ^ g 1/n + 11 

«“1 *-»l 

in the case of varying probabilities, and 

(5.18) 1/n - (5o) 1/n + (5o)^ 

in the case of constant probabilities. If 6, is not known in each trial, an upper 
bound, 56, may be estimated on the basis of knowledge (including statistical 
tests) of the digit generating process. Then the following inequality will hold: 

(5.19) 1/n - {hf ^ Trk ^ 1/n + (5,)^ 
where 56 ^ (1 — ne). 

It is worthy of note that inequalities (5.11) and (5.15) become equalities if /i = 
2 (binary system), thus, 

(5.14b) Pit = n 5. = n 1 p. - I = n 12p, - 1 I; 

8-1 

(5.15b) pk = (5o)^' = I p - g f = 1 2p - 1 1^ 

These results were obtained by different methods in [1]. 

6. Discussion of results. Certain facts are implicit in the foregoing anal^^sis, 
but are worthy of mention in passing. The compounding process may consist 
of addition modulo n of digits taken from a number of digit-producing machines. 
If any machine, is perfect, i.e., prh = l/n for all r, each element of the probabil¬ 
ity matrix will be equal to 1 //i, and p^ = 0. Consequently, each element of 
tts, s ^ hj will be equal to l/n by (5.17) and the special case of (5.02). Thus 
any combination which contains a perfect machine is perfect. This is equivalent 
to a restatement of Von ]\Iises’ [3] requirement that the sum of a random set 
and any other set must itself be a random set. Furthermore, by (5.02) the re¬ 
sults taken from any machine, no matter how nearly perfect, can be improved 



86 


H. BURKE HORTON AND R. TYNES SMITH III 


by combining with the results of another machine, no matter how biased the 
latter may be. In the limiting case, = 1 (or 0), the probabilities of the vari¬ 
ous digits are merely interchanged. 


7. Production of random numbers with fixed but imequal probabilities. The 

principles presented above can be adapted to the production of random numbers 
with unequal probabilities as follows: Assume that a set of random digits, 0, 1, 
2, • • • (r — 1), is required in a number system of base ii, with probabilities qo , 
5^1 > 92, • • • 9n-i, 9i = 1, where each Qi is a proper rational fraction which 


'll ’ 

may be written as the quotient of two positive integers, qt = *. Choose m 

Vi 

as the basis of a new number system, where m is the least common multiple of 
the Vi , 


(7.1) 



muijvi 

m 


A set of random digits, 0, 1, 2, • • • (m — 1), in a number system of base m may 
be generated by the process described above, or a set of such digits may be con¬ 
structed by entering an existing table of random digits, base ?i, and interpreting 


appropriate numerical quantities, base n, as digit symbols, base m. 


Since — 


is an integer, groups of digits, muQ , mui , • • • , in the m system may be 

coded as digits, 0, 1, 2, • • • (n — 1), in the n system. An upper bound for the 


maximum bias of Qi will be pk , where ph is the range of iTrk in the m system. 

Vi 


Thus, by increasing k, the bias of Qi can be made smaller than any preassigned 
quantity. 


8. Convergence imder more general conditions. Convergence of Xr, to 1/n 
occurs under a variety of conditions less restrictive than (3.1). 

(8.1) Theorem. In the case of independent trials, a necessary and sufficient 


condition that \im Xr8 = l/nisthat 


\7r1t 


^ €, where e is a fixed positive number, 


arbitrarily small, and t is a fixed positive integer, arbitrarily large. It is obvious 
that (8.1) is a necessary condition for convergence. To prove that it is a suffi¬ 
cient condition, consider the following: 
fpo.' 


(8.2) Lemma. // 
then lim = 1/n. 


Pi 


^ 7], where 77 is a fixed positive number, arbitrarily small, 


Proof: Take a fixed integer, h,h n — 1. Now any digit, r, can be obtained 
in at least one way; i.e., as the sum of r ones and {h — r) zeros. Therefore, 

(8.3) Xrh ^ r, where r — rj". 



PRODUCING RANDOM DIGITS 


87 


We now regard h trials as a single trial of a complex machine. Let u represent 
the number of such complex trials. Let 5rJ„ represent the probability of ob¬ 
taining the rth digit as the result of addition modulo n of m complex trials. Then, 

(8.4) lim TT ru — lim v,,(uh) = 1/n, 

by (5.01). Now s = uh + j, 0 ^ j < h, (j an integer), ov vh uh + j < 
(u + l)h. The j simple trials cannot increase the maximum bias, by (5.02): 
consequently, 

(8.5) lim Trr,iuh + j) ~ lun ~ 

w—*00 (ufc-h /)—*00 

Since there is a one-to-one correspondence between the elements of {sj and 
\uh + j], 

(8.6) limiTr. = 1/ra. 

l-*o0 

By a natural extension of the lemma, we may regard t trials as a single com’ 
plex trial. Theorem (8.1) thus assumes the form of (8.2). 

9, Numerical results in various number systems. More efficient convergence 
formulae can be devised to meet special conditions. Those presented in (5) 
have the advantages of simplicity and generality. To test the efficiency of 
(5.15) several numerical examples, based upon unusual hypothetical probabilities, 
were worked by matrix multiplication as in (4.1). In these problems pn = pr , 
a constant, from trial to trial. A tabular comparison of the ranges, computed 
by (4.1), and the upper bounds, determined by (5.15), is presented in Table 1 
for k = 10. 

10. Preparation and tests of a set of random digits. Since an unlimited num¬ 
ber of valid tests for randomness may be devised, it is obvious that any finite 
set of digits cannot meet all such tests. As a matter of fact a truly random proc¬ 
ess should yield sets which fail to meet some proportion of the tests, the fraction 
being determined by the level of significance adopted in testing. No finite set 
of digits can be considered random; the tests for randomness are really applied 
to determine the character of the generating process. However, the concept of 
‘docally random” sets as developed by Kendall and Smith [4] is useful, and some 
of their tests are used below as evidence that a set of numbers produced by com¬ 
pound randomization is likely to be locally random. 

A non-random set of 10,000 decimal digits having the relative frequencies 
indicated in the starred line of Table 1 was punched in cards and tabulated. 
Totals were taken for each ten cards and the amount in the unit’s position of the 
counter was cut in a summary card, thereby producing a set of 1,000 digits. 
The frequencies of digits in the derived set are compared with those of the gen¬ 
erating set in Table 2. The frequencies of the derived set are in accord with the 
hypothesis of equal probabilities. 



TABLE 1 


Comparison of computed range and formula for maximiun bias, fc = 10 
Hypothetical numerical examples, constant probabilities from trial to trial 


Num¬ 

ber 




Probability in an individual trial 





(«o)io 

So 

,base 

Pn 

pi 

P* 

P* 

Pi 

Pi 

pi 

p7 

P» 

p9 

pt 

Po 




2 

.800 

.200 



— 



— 

_ 

— 

— 

_ 

.0060466176 

.0060466176 

.600 

3 

.500 

.300 

.200 


_ 

_ 

_ 

— 

— 

— 

— 

— 

.0000018357 

.00(M)()59049 

.300 

3 

.970 

.020 

.010 

— 

— 

— 

— 

— 

— 


— 

— 

.6616765365 

.6648326360 

.960 

3 

.400 

.300 

.300 

_ 

— 

— 

— 

— 

— 

— 

— 


.0000000001 

.0000000001 

.100 

4 

.200 

.100 

.400 

.300 

— 

— 

— 

— 

~ 

— 

— 

— 

.0000032768 

.0001048576 

.400 

5 

.050 

.200 

.400 

.020 

.330 

~ 

— 

— 

— 

— 

— 

— 

.0007878177 

.0156833688 

.660 

6 

.080 

.240 

.360 

.020 

.200 

.100 

— 

— 

— 

— 

— 

— 

.0000168472 

.0060466176 

.600 

7 

.300 

.020 

.240^ 

.050 

.130 

.170 

.090 

— 

— 

— 

— 

— 

.0001778804 

.0025329516 

.550 

8 

.200 

.050 

.060 

.180 

1 

.160; 

.090 

.150 

.110 

— 

— 

— 

— 

.0000000965 

.0000627821 

.380 

9 

.030 

.080 

.150 

.060' 

1 

.140 

.090 

.190 

1 

.050 

.210 

— 

— 


.0000052328 

.0005259913 

.470 

10 

.050 

.150 

.200 

i.o5o: 

.050 

.120 ; ,080 

.020 

.ISO 

.100, 

— 


.0000132662 

.0009765625 

.500 

10 

.010 

.020 

.030 

.040l 

.050 

.060 

.070 

.080 

.090 

.550! 

— 

_ 

.0012522218 

.0282475249 

.700 

10 

.110 

1.110 

.110' 

.110 

.110 

.110 

.110 

.110 

.110 

.010 


1 — j 

.00000000011 

.0000000001 

.100 

10 

.150 

.150 

.150 

.150' 

.150' 

.050 

.050 

.050' 

.050; 

.050 

1 

— ' 

.0000009244 

.0009765625 

.500 

10* 

.014 

'.1711 

.164! 

.184 

1 

.023:.095 

.047 

.205 . 089 

1 

.oos; 


1 

.0000501840 

1 

.0111739516 

.638 

12 

.010 

.070 

.120 

.160 

.050 

.020 

.090 

.040 

.080 

.110 

.060 

.190' 

1 

.0000002256, 

.0009765625 

.500 


* This badly biased set of probabilities was used to produce the set of random decimal 
digits tested in the next section, 

TABLE 2 


Digit 01 2 3 456780 

Generating set.014 .171 .104 .184 .023 .095 .047 .205 .089 .008 

Derived set.088.112 .080 .105 .113 .102.101 .098 .097 .098 

_ I_ 1_ 


Frequency test (derived set) = 7.0 P — .03 

TABLE 3 


(i + l)th digit 


ith digit- 

0 I 1 ! 2 I 3 


0 

11 

8 

7 

7 

1 

10 

13 

15 

9 

2 

11 

10 

7 

10 

3 

9 

10 

3 

14 

4 

G 

12 

10 

10 

5 

9 

17 

11 

14 

6 

6 

14 

9 

9 

7 

13 

10 

9 

9 

8 

7 

8 

8 

12 

9 

6 

10 

7 

11 


4 

5 1 

1 6 

7 

8 

9 

5 

7 1 

12 

12 

11 

8 

11 

14 1 

I 11 

8 

10 

11 

10 

7 

0 

9 

7 

9 

12 

17 

9 

8 

11 

12 

19 

0 

IG 

14 

13 

7 

10 

() 

5 

15 

G 

9 

14 

10 

15 

8 

G 

10 

8 

11 ! 

7 

12 

7 

12 

9 

11 

14 

8 

10 

10 

15 

13 

G 

4 

16 

10 


x" = 96,8 

88 


P = .90 











PRODUCING RANDOM DIGITS 


89 


In the serial test adjacent pairs of digits are tabulated. The distribution of 
these pairs in the derived set appears in Table 3. This test indicates that ad¬ 
jacent digits are independent. 

TABLE 4 


Gap test 


Digit 


0-1 

Length of gap 

2-4 5-7 

8 and 
over 

x' 

P 



Frequencies 




0 

Observed. 

16 

18 

11 

42 

1.25 

.75 


Expected. 

16.53 

19.10 

13.92 

37.45 



1 

Observed. 

27 

27 

21 

36 

5.44 

.15 


Expected. 

21.09 

24.37 

17.76 

47.78 



2 

Observed. 

16 

17 

10 

42 

1.90 

.60 


Expected. 

16.15 

18.66 

13.60 

36.59 



3 

Observed. 

19 

26 

18 

41 

.90 

.92 


Expected. 

19.76 

22.83 

16.64 

44.77 



4 

Observed. 

31 

17 

20 

44 

7.39 

.06 


Expected. 

21.28 

24.59 

17.92 

48.21 



5 

Observed. 

15 

21 

15 

50 

2.04 

.57 


Expected. 

19.19 

22.17 

16.16 

43.48 



6 

Observed. 

27 

25 

12 

36 

5.95 

.12 


lOxpected. 

19.00 

21.95 

16.00 

43.05 



7 

Observed. 

20 

19 

16 

42 

.40 

.93 


Expected. 

18.43 

21.29 

15.52 

41.76 

j 

j 


8 

Observed. 

14 

19 

21 

42 

3.27 

.35 


Expected. 

18.24 

21.07 

15.36 

41.32 

i 

1 

9 

Observed. 

18 

18 

21 

40 

i 

2.53 

.48 


Expected....... 

18.43 

21.29 

15.52 

41.76 




The gap test is based upon the distribution of lengths of intervals between 
given digits. A comparison of the number of gaps of specified lengths and the 
expected number in each case is presented in Table 4. The results of this test 






















90 


H. BURKE HORTON AND R. TYNRS SMITH III 


are also in accord with the assumption of local randomness. Noting the badly 
biased probabilities of the initial set of digits, the results of these tests demon¬ 
strate the effectiveness of the compound randomization process. 

The use of tabulating equipment for producing random decimal digits by addi¬ 
tion modulo 10 is relatively fast and simple. The authors have just completed 
production of a set of 105,000 digits in less than two days' tabulating time. 
75,000 cards, representing approximately 3 months' receipts of a current carload 
waybill study, were used to generate the digits, 14 non-correlated columns being 
added simultaneously. A chain of length 10 was used, although the nature of 
the initial data was such that a shorter length would probably have given satis¬ 
factory results. The derived set is now recorded on 1500 cards, 70 digits per 
card. Preliminary tests for local randomness confirm the random nature of the 
generating process. Upon completion of the tests this set will be reproduced in 
tabular form. 


REFERENCES 

[1] H. B. Horton, method for obtaining random numbers,*’ Annals of Math. Slat., Vol. 

19 (1948), pp. 81-85. 

[2] J. E. Walsh, “Concerning compound randomization in the binary system,” unpub¬ 

lished manuscript. Project RAND^ Douglas Aircraft Co.^ Santa Monica, Califor¬ 
nia. 

[3] R. VON Mises, Probabilityy Statistics and Truthy The Macmillan Co., New York, 1939. 

[4] M. G. Kendall and B. B. Smith, “Randomness and random sampling numbers, 

Roy, Stat. Soc. Jour.y Vol. 101 (1938), pp. 147-166. 

[5] M. G. Kendall and B. B. Smith, “Second paper on random sampling numbers,” Supp. 

to Roy, Stat. Soc. Jour.y Vol. 6 (1939), pp. 51-61. 

[6] O. U. Yule, “A test of Tippett’s random sampling numbers,” Roy. Stat. Soc. Jour., Vol. 

101 (1938), pp. 167-172. 

[7] C. W. Vickery, “On drawing a random sample from a set of punched cards,” Supp. to 

Roy. Stat. Soc. ,Tnur.y Vol. 6 (1939), pp. 62-66. 



ON A MATCHING PROBLEM ARISING IN GENETICS 

By Howard Levene 
Columbia University 

1. Summary. A statistic useful for detecting deviations from the Hardy- 
Weinberg equilibrium in population genetics is discussed. Both exact and 
asymptotic distributions are given and a special case where there is misclassifica¬ 
tion is discussed. The distribution obtained also arises from a certain card 
matching problem. 

2. Introduction. A system of multiple alleh's behaves as hjlhjws under 

Alendelian inheritance: There are r distinct forms or alleles, Oi ,•••, ar, of a 
given gene. A given individual contains two genes and can be represented as 
ai/oj . If t = j the individual is called a homozygote; if i 9 ^ j it is called a 
heterozygote. The representation a^/aj is called the genotype. In reproduction 
each gamete produced by an individual contains one gene which has a 

probability 1 /2 of being and 1 /2 of being Uj . In fertilization a paternal and a 
maternal gamete fuse to form a new individual which contains two genes, giving 
the well-known Mendelian ratios. We now consider a large random breeding 
population of N individuals. This wWl contain 2 N genes, of which the propor¬ 
tion Qi will be of type at(i = 1, * • • , r; '^(ji = 1). The probability that a 
random individual from the next generation will be at/Oj is q^t(i = j) or 2qtqj{i j), 
which are known as the Hardy-Weinberg equilibrium probabilities. The 
statistical problem arose in testing (by means of a sample of n individuals) the 
hypothesis that this Hardy-Weinberg ratio holds against the alternative hypothe¬ 
sis that disturbing forces decrease the number of homozygotes. The actual 
data has been discussed elsewhere [1]. 


3. The sample distribution of number of homozygotes. We shall assume 
throughout this paper that N is so large that random fluctuations in the pop¬ 
ulation proportions from generation to generation can be ignored. Let 
Xij(i < i = 1, • • • , r) be the number of ai/aj individuals in the sample, and let 
Vi = Xu + be the number of a* genes in the Siimple. We have = n 

and '^yi = 2n, Let h = be the number of homozygotes, and z — a — h 
be the number of heterozygotes in the sample. The probability of the observed 
sample is 


( 1 ) 


P = 


n\ 


f _!L 


tt;-, n (#'•) n (23.5, 






91 



92 


HOWARD LEVENE 


Since the qi are unknown we use the conditional probability when 2/i > * * * »J/r 
are held constant. Whenever Ave use the word “conditional' hereafter, this 
condition will be understood. The conditional probability is 


( 2 ) 


n!2^ 

K' XT-. > where 

_1 = V' 

K' ~ ^ H ! ’ 


where the summation is over all non-negative integral values of the Xij sub¬ 
ject to the condition 

Xii “f” ^jXij — T/i (t = 1, • • * ) r). 


Consider 

(3) 

(4) 


(e^J" = (E i? + 2E 


n!2 


- V* TT 


where the summation 2* is over all non-negative values of the a:,-,- subject 
to the condition = n. Evidently 1/K' is the coefficient of in (4); 

but this must equal the coefficient of this term in the left member of (3); and 
thus l/K' = (2n)!/ni/,!. Hence the conditional probability of the observed 
sample is 


(5) 


(2n)! 


For any function n(xii , • • • , Xir, • • • , Xrr) we will now let E{u) and o^(u) 
denote the conditional mean and variance of u for fixed ?/*, and will refer to them 
simply as the mean and variance. We first obtain the sih factorial moment of 
Xtt , that is E(xil^), where x^'^ = x(x — 1) • • • (a: — 5 + 1). Consider 


( 6 ) 


2^n! 

IT 

i^k 




.,2in - s)l 

jLLk 


where x'jk = Xjk except that x[x = x,, — s, and E' has the same meaning as in (2). 
The right member of (6) is evaluated exactly as before, giving 


(7) 


E{x\'>) 


(2nyM ' 


From this expression we obtain 

(8) = n/^ + 0(1), 



A MATCHING PltOIlLKM 


93 


and 


<® - h^. + - [(S)-J = + «*>' 

where /,• = r/t/2rt is the sample estimate of Qi. Similarly 


( 10 ) 

giving 

( 11 ) 






(2n) 


(2«+20 


( 2 ) ( 2 ) ( 2 ) 

(j. . Vj . 

4(2a-l)^ 


(2) 72) 

ly. ?/;• 


Other moments can be similarly evaluated, in particular E(x,i) = y,yjf( 2 n — !)• 


4. Asymptotic distribution of number of homozygotes. From (8), (9), and 

(11) we may easily obtain 

(12) Eik) = ^F;(.r..) = (C - 2,i)/i4n - 2), 

(13) <T\h) = r<r-(x..) + 2i;i: (7(.r,., x,,) 

»<j 

(14) - ± (cfa + 2) + C' - D (’i±i)} - ! + 0 (t). 

where C = Z?/» and I) = . The formula (M) is a close approximation to 

(13) and is easily computed. From (5) by means similar to those classically 
used to prove asymptotic normality of the binomial distribution we can prove 
asymptotic normality of the conditional distribution of h] more precisely, if 
71 —> 00 and yi/ri —> constant (i = 1, • • • , r), then 


(15) 


Prob 


j h ~ E{^ 

t 



_I 



(lx. 


6. Effect of misclassification. There is a further complication in the particular 
case reported in [1]. All individuals of genotype a</a» are correctly classified, 
but an individual of genotype ai/aj {i 9 ^ j) has a known probability p /2 of being 
classified a»/a» and an equal probability of being classified a^/a^ . As a result, 
the observed proportion of homozygotes is a biased estimate of the proportion in 
the population. Let /i, Xij , 2 /. denote the true sample values, and let /i', Xij , y[ 
denote the recorded sample values. Then h* — K — e, where e = (n — h')- 
p/(l — p), will give an unbiased estimate, ix, E{h*) = E{h). In order to use h* 
we must have its (conditional) variance. Since h* = up/{I — p) + /iV(l “ p), 

al = [1/(1 - p)JV^ . 

Let h — h' = €, then for large fixed (a — li), e is approximately normally dis¬ 
tributed with mean (a — h)p and variance 



94 


HOWARD LEVENE 


- h)p(l - p) = [/I - E(h)]pii - p)[i + Op(l/Vn)]. 

Neglecting the remainder term in this variance, c and h have a joint normal 
distribution with parameters that are easily calculated. We thus have 

+ cTe + 2a-(/i, €), or <7^ = [r — E{h)]p(l — p) + (1 — p)Va , 

giving 

(16) al ^al + [n- E(h)]p/(l - p). 

In [1] <7/% was given as (tI + c for the sake of simplicity. This would tend to be 
smaller than (16), but only negligibly so. Strictly speaking the calculation of 
E(h) and al from (12) and (11) requires a knowledge of the true yi , but the 
obser\^ed y[ are unbiased estimates of the yi and their use should cause no 
serious trouble. 

6. Combinatorial statement of the problem. This problem can also be 
expressed as one of card matching as follows: A deck contains 2n cards of r 
different suits; with ?/,• cards of the fth suit {i = 1, • • • , r). Wo draw 7i pairs of 
cards at random without re[)lacement, exhausting the deck. What is the 
distribution of h, the number of twins (pairs in which both members are of the 
same suit). U z = n — h, the probability of exactly h twins is given by (5), and 
in the limit h is normally distributed with mean given by (12) and variance 
given by (14). The (^ard matching problem does not involve the notion of 
conditional probability. By introducing variables Ua equal to one if the ath 
pair is a twin and zero otherwise, the moments of h can also be obtained with¬ 
out using generating functions. 


REFERENCE 

[1] Theodosius Dobzhansky and Howard Levene, ‘‘Genetics of natural populations. 

XVII. Proof ot operation of natural selection in wild populations of Drosophila 
pseudoohscuraf” Geneticsy Vol. 33 (1948), pp. 537-547. 



A MULTIPLE DECISION PROCEDURE FOR CERTAIN PROBLEMS IN 

THE ANALYSIS OF VARIANCE 

By Edward Paulsox 
Univeraity of Washifigton 

1. Introduction. In this paper we will discuss a certain type of profjlem 
which arises in many applications of the analysis of variance. We suppose 
that we are given K varieties, and are required to investigate the differences 
among them on the basis of the observed yields from a given experimental 
design, such as a set of randomized blocks or a latin square. The classical 
procedure [1] for dealing with this problem has been to test the null hypothesis 
that the K varieties are all equal by computing the ratio of the mean sum of 
squares between varieties to the residual mean sum of squares, and rejecting 
the null hypothesis whenever this ratio exceeded the critical value corresponding 
to the level of significance used. However, the standard discussions of this 
procedure seem to be quite vague on the question of what action should be taken 
after the null hypothesis has been rejected. 

In a number of problems, the practical situation seems to be such that instead 
of testing the null hypothesis that the varieties do not differ, what is really 
required is a statistical rule or ‘‘decision function^' which on the basis of the 
observed yields will classify the K varieties into a “superior^^ group and an 
“inferior’’ group. If the superior group consists of more than one variety, 
the next appropriate action will of course depend on the particular problem at 
hand. In some situations the varieties in the superior group might then be 
subject to further selection on the basis of some secondary characteristic, or 
additional observations might be taken to discriminate between the members 
of the superior group, after discarding the varieties in the inferior group. How¬ 
ever, if all varieties happen to be classified in one group, the group will be 
labelled “neutral” and this result is to be interpreted as implying that the 
varieties are homogeneous. 

In this formulation, the problem is now of a multiple decision type; it is 
necessary to decide on the basis of a sample which one out of the 2^—1 possible 
decisions (or classifications) to select. We will suggest a solution which seems 
quite reasonable on an intuitive basis, but it is still an open question whether 
this solution is an optimum one. 

2. A special case. In this section we will discuss the problem under the 
assumption that the variance (/ of a single observation is known a 'priori. This 
is a rather restrictive assumption, but it can be considered as approximately 
satisfied when the number of degrees of freedom available for estimating the 
variance is large, which will often be the case. The minor modifications neces¬ 
sary to secure exact results for the small sample case when <r is unknown are 

95 



96 


EDWARD PAULSON 


discussed in section 3. We also assume that the experimental design has been so 
selected that there will be the same number (r) of observations on each of the K 
varieties. 

Now let Xia = the ath observation on the ith variety (i == 1, 2, • • • , /C; a = 
1, 2, • • • , r), let !•< = 2!) a-i Xijr, put m,- = E{xi) where E stands for expected 
value, and take X to be a given positive constant. The conventional assumption 
is made that all the observations are normally and independently distributed 
with the same variance (j\ Denote by Xm the maximum of the K mean values 

, ^ 2 , • • • , . The rule for dividing the varieties into superior and inferior 

groups is the following: the superior group is to consist of all varieties whose cor¬ 
responding mean values fall in the interval [xm — Xo-/\/r, ^Af] and the remaining 
varieties constitute the inferior group. (As mentioned earlier, if all the varieties 
fall into one group, this group is labelled ‘neutral* and the varieties are considered 
homogeneous.) 

This rule completely determines the classification as soon as X is determined. 
For a given sample size, we might select X by considering the relative importance 
of different types of incorrect classifications. If H denotes the error of mis- 
classifying the varieties when in fact they are all equal, and G denotes the error of 
misclassifying the varieties when they actually are unequal, then it is obvious 
that the greater the value of X, the smaller the probability of an error of type //, 
but the greater the probability of an error of type G. Therefore for a given 
value of r it is necessary to adopt some sort of compromise in selecting X. 

For a given value of X we will now derive explicit formulas for P(H)j the 
probability of not classifying all the varieties in one group when nq = 7/12 = • • • = 
Mk , and for P{Gi) the probability that as a result of the experiment there will 
not be a superior group consisting only of the Ath variety when nii = ni 2 = • • • = 
rriK-i = rn and mK= m + A(A > 0). Gi was selected because it appeared to be 
the particular kind of type G error most likely to be useful in applications. 
Also P((?i) may be regarded as the least upper bound of the probability of 
misclassifying the varieties when one variety is superior to any of the others 
by an amount at least equal to A. Now if we denote by W = {:xm — Xmin) 
the difference between the maximum and minimum values of the set {Xxj 
{i = 1,2, • • • , A), then it is obvious that 

(2.1) 1 - P(/f) - 

The right hand side of (2.1) is equivalent to the probability that the range of a 
sample of A independent observations from a normal distribution with imit 
variance be less than X; this probability has already been tabulated by Pearson 
and Hartley [2]. From these tables it is a routine matter to find P{H) cor¬ 
responding to a given value of X, and conversely. To evaluate P(Gi), we have 

1 - p(r;o = pjxi < 


for each i (i = 1, 2, • • • , /C — 1) 



MULTIPLE DECISION 


97 


By evaluating the probability of this event for a fixed value of xk and then 
integrating out with respect to Xr ^ it is a simple matter to verify that 


( 2 . 2 ) 


|/+(A/a) ^ 


In some applications, it may be desirable to have an explicit expression for the 
probability that the superior group will consist of the Kth variety and not more 
than s inferior varieties when rrii = nh • • • = = m and Mr = rn + A. 

If we denote this probability by 1 — it is not difficult to show that 

1- P* = [Ti„ + aT^a], where 

a—0 \ ^ / 


Tla 

1 

\/^ J 

-(»S/2) 

( 

f—ao 

■ 1 

-|/+(A/<r)V^-X n/C-a-l 

L 


(2.3) 




r 1 -i/+(AM\/r -la 

Lv27r JyHA/<r)V~r-X J 

and 


1 

■\/^ • 

-00 

f 

J—OQ 

i> 

1_1 

py—X ”|A'—a—1 




[V2:r 

Jy-X 

-la- 1 r 1 /.y- 

f . dl 

J L V ^TT X 

dy. 


3. General case. We now briefly discuss the exact treatment of the problem 
when a is unknown. The notation of section 2 will be used, but in addition 
denote by s“ an estimate of < 7 " resulting from the given experimental design 
which is based on the residual sum of squares with n degrees of freedom. It is 
well known that is independent of the set {l\} (i = 1, 2, • • • K), Xow the 
rule to be used in classifying the varieties into two groups is as follows: the 
superior group is to consist of all those varieties whose mean values fall in the 
interval [.f\v — the inferior group consists of the remaining 

varieties. 

We now find that: 

(3.1) 1 - P{H) = P{TF < 

The right hand side of (3.1) depends only on the distribution of the ‘studcntized’ 
range and has also been tabulated b}'^ Pearson and Hartley [3] although the 
tabulation is considerably less complete than that of the range in [2]. It is also 
easy to verify that the expression for P(Gi) now becomes 



with a similar modification for P, . 



98 


EDWARD PAULSON 


4. Remarks. Any application of the ideas suggested here would be greatly 
facilitated if tables of P(Gi) were made available. If this were done, it would be 
possible to decide in advance of an experiment how large r should be in order 
to have a fixed control over both types H and Gi errors. It is obvious that 
further research both along theoretical and applied lines is needed. In conclu¬ 
sion, the Avriter would like to thank Professor Albert Bowker for several helpful 
suggestions. 


REFERENCES 

[1] R. A. Fisher, Statistical Methods for Research Workers, Chapters 7, 8. 

[2] E. S. Pearson and H. O. Hartley, ‘‘Tables of the probability integral of the range in 

samples from a normal population, Biometrika, Vol. 32 (1941-42), pp. 301-310. 

[3] E. S. Pearson and H. O. Hartley, “Tables of the probability integral of the student- 

ized range, Biometrika, Vol. 33 (1943), pp. 89-99. 



A MODIFIED EXTREME VALUE PROBLEM 


By Benjamin Epstein^ 

Coal Research Laboratory^ Carnegie Institute of Technology 

1. Introduction and summary. Consider the following problem. 

Particles are distributed over unit areas in such a way that the number of 
particles to be found in such areas is a random variable following the law of 
Poisson, with v equal to the expected number of particles per unit area. Further¬ 
more, the particles themselves are assumed to vary in magnitude according 
to a size distribution specified (independently of the particular unit area chosen) 
by a d.f. F{x) defined over some interval a < x < b, with F(a) = 0 and 
F{b) = 1. The problem is to find the distribution of the smallest, largest, or 
more generally the nth smallest or nth largest particle in randomly chosen 
unit areas. 

The problem as stated is not completely specified. To specify the distribution 
of smallest or largest particles in a unit area one must give a rule for dealing with 
those areas which contain no particles at all. More generally, in the case of the 
distribution of the nth smallest or nth largest particle, one must give a rule for 
dealing with those areas which contain (n — 1) or fewer particles. There are at 
least two possible alternatives. One alternative is to omit none of the areas 
from consideration by setting up the following rule: if no particles are found in a 
given unit area then this area will be considered as one for which the smallest size 
particle is x = 6 and for which the largest size particle is x = a. More generally, 
if (n -- 1) or fewer particles are found in a given unit area then this area will be 
considered as one for which the nth smallest size particle is x = 6 and for which 
the nth largest size particle is x = a. A second alternative is to restrict attention 
to those areas which contain at least one particle (in the case of the distribution 
of smallest or largest values) or at least n particles (in the case of the distribution 
of the nth smallest or nth largest particle). In other words, this means finding 
the relevant conditional distribution. 

From the point of view of the application of the theory of extreme values to 
fracture problems, there are some situations where the first model and other 
situations where the second model is the more appropriate in describing the 
phenomenon under investigation. In this paper section 2 will be devoted to a 
derivation of the distributions associated with the first alternative; in section 3 
the conditional distributions will be described briefly. 

2. The distributions under the first alternative. In this section we shall 
be concerned with the first alternative. To find the distribution of the nth 
smallest particle in unit areas, we first observe (the verification is left to the 

1 Present address, Department of Mathematics, Wayne University, Detroit, Michigan. 

99 



100 


BENJAMIN EPSTEIN 


reader) that under the hypotheses of section 1, the number of particles having 
size <a: in a unit area is distributed according to the law of Poisson, with 
expected number equal to vF{x), Next we note that the probability that the 
nth smallest particle in a unit area exceeds x in size is equal to the probability of 
finding exactly 0, or exactly 1, or exactly 2, • • • , or exactly (n — 1) particles of 
size <xm that area. Therefore Gn(x), the probability that the nth smallest size 
particle in a unit area is < x, is given by 


( 1 ) 


n-l 


(?»(*) = 1 - Z 

#-o 


mx)y 

i! ’ 


X < b-, 


= 1, x>b, 

where we have assigned to the size x — b the probability 2y::o e~'’W /j )! which is 
just equal to the probability of finding fewer than n particles in a unit area. 

If the d.f. F{x) has a derivative f{x) for all x lying in a < x < 6, then G»(x) 
has a derivative for any value oi x b. Therefore the probability density for 
the nth smallest size particle is, for any x 7 ^ b, given by the function gnix) where 

(2) '1^'-/W- a<,<h-, 


= 0, X < a, X > b. 

n—1 

A finite probability Z is assigned to x = 6. 

J-0 J • 

If one makes the transformation y = vF{x) (for a similar transformation in 
extreme value theory see [I, page 371]), then (1), and (2) become 


(10 

Gt(y) = 1 - Ze-'fl, 

J-O J 1 

y <v; 


II 

IV 


and 



(20 

*f \ ^ V ^ 

(n - 1)!’ 

0 < y < v; 


= 0, y < 0 , y > V. 


A finite probability Z « ’is assigned toy = v. 

3-0 J ‘ 

The distribution of the smallest size particle in a randomly chosen area is 
found by letting n = 1 in equation 1. 

In a similar way one can find the distribution of the nth largest particle in a 
randomly chosen unit area. Hn(x), the probability that the nth largest size 
particle in a unit area is <x, is given by 



A MODIFIED EXTREME VALUE PROBLEM 


101 


(3) 


Hn{x) = 0, X < a; 

_ ^ [^(1 ~ F(x))y 


1-0 


r- 


X > a, 


'V* J 

where we have assigned to the size x = a the probability 22 e"" . 

j-o J I 

If, as before, F(x) is assumed to have a derivative f{x) for all x lying in 
a < X < by then the probability density for the nth largest size particle is, for 
any x a, given by the function hnix) where 

6 I V\^/> d X ^ h\ 

= 0 , 


(n - 1) ! 

X < a, X > b. 


A finite probability lY) e " -r: is assigned to x = a. 

|;-0 j\ 

If one makes the transformation z = v[l — /^(a:)], then (3) and (4) become 


(30 

Htiz) = 1 - 

i-o 3 ! 

z < v; 


= 1, a > p, 


and 



(40 

, V e *z^ ^ 

0 < z < V] 


= 0, Z < 0, Z > Vy 

n-l J 


with a finite probability 22 ^ ^ assigned to z = v, 

;-o J 1 

The distribution of the largest size particle in a randomly chosen unit area is 
found by letting = 1 in equation 3. 


3. Conditional distributions of the ejrtreme values. The appropriate con¬ 
ditional distributions for the problem under consideration can be written down 
readily. The step function component which occurred in section 2 is no longer 
present since we restrict our attention only to those areas which contain at least 
n particles (in the general case of the distribution of nth smallest or nth largest 
size particles). 

Gn{x)y the d.f. of the nth smallest particle in a unit area chosen at random 
from the class of areas containing at least n particles, is given by 

Gn{x) =0, X <a; 

1 - 2 e-'^^^\vF(x)y/jl 

__ _ 

n—1 ^ 

1 - Z e-V/j\ 

i-o 

= 1, X > b. 


(5) 


a < X < b; 



102 


BENJAMIN EPSTEIN 


Similarly Hnix), the d.f. of the nth largest particle in a unit area chosen at random 
from the class of areas containing at least n particles, is given by 

H"„{x) =0, a: < a; 

g - F(x))y/jl - Z e-V/j! 

(6) = ---, a < X <b; 

1 - E e~V/j[ 

j-0 

= 1, X > 6. 

4. General remarks and an application. It is interesting to note that the 
assumptions of section 1 lead to distribution functions in section 2 which are 
precisely the same as the asymptotic distributions of smallest, largest, or nth 
smallest, or nth largest values in samples of fixed size N(N oo) (see e.g. 
[1, p. 371]). In the problem treated in this paper, */, the expected number of 
particles in a unit area, plays the role of N in the fixed sample size case, with the 
important difference that the distributions in the present paper are exact and 
not merely asymptotic. 

The results of this paper have a direct bearing on certain aspects of fracture 
problems [2] and in particular on the dielectric breakdown of capacitors [3]. 
In the latter problem there appears to be ample justification for assuming that 
the breakdo^vn voltage is influenced to a considerable degree by the presence of 
flaws known in the technical literature as conducting particles. These particles 
are spread individually and collectively at random throughout the area of the 
capacitor and, depending on their size, create a local weakening of the capacitor 
by reducing the nominal insulation thickness in the neighborhood of flaws. 
The voltage required to break do^vn the capacitor is equal to that required to 
break it down at that spot where the greatest penetration has taken place. 

In the dielectric problem the statistical distribution of largest values ap¬ 
propriate to the problem is given by (3) with = 1, and the size distribution of 
conducting particles follows a law of the form f(x) = x > 0, This is a 
situation where all the capacitors under test are part of the sample (since all 
must be tested to destruction) and those which happen to contain no defects (an 
event with probability e"') act as if the largest particle size is equal to a == 0. 
e”’*' simply represents the expected fraction of capacitors which have strength 
equal to the theoretical strength of the insulation. 

The conditional distributions of section 3 would be more appropriate in the 
following sort of practical situation. Suppose that surface flaws spread at 
random on glass rods are known to reduce greatly the strength of the rods. 
Suppose that in a given sample of glass rods one takes out by some method of 
inspection those specimens which have no flaws. Then the strength distribution 
of the remaining specimens is a conditional distribution since each specimen must 
contain at least one flaw to be eligible as a member of the sample. 



A MODIFIED EXTREME VALUE PROBLEM 


103 


REFERENCES 

[1] H. Cramer, Mathematical Methods of Statistics^ Princeton University Press, 1946. 

[2] B. Epstein, ‘^Statistical aspects of fracture problems,” J. Applied Phys.j Vol. 19 

(1948), pp. 140-147. 

[3] B. Epstein and H. Brooks, “The theory of extreme values and its implications in the 

study of the dielectric strength of paper capacitors,” J. Applied Physics^ Vol. 19 
(1948), pp. 644-550. 



ON DISTINCT HYPOTHESES 


Bt Agnes Behoer and Abraham Wald 
Columbia University 


1. Introduction. The following problem was suggested to one of the authors 
by Professor Neyman: 

Let X = (Xi, A ’'4 , • • • , X„) be a chance vector and let h denote any simple 
hypothesis specifying its distribution. Let Hi be the composite hypothesis 
that some element h of a set of simple hypotheses {A},-, (f = 0, 1), is true, and 
assume that Ho and H\ are known to be exhaustive. Let hi denote an element of 

{/i)< a = 0,1). 

For any region TP of the sample space S, let P{W | h) be the probability that 
the sample point falls in TP when h is true. 

We shall call Ho and Hi distinct, if a region TP exists for which 


P(W I k>) ^ P(W I hi), 


for all ho e {h}o 
and all hi c {h\i . 


The problem is to establish necessary and sufficient conditions for two composite 
hypotheses Ho and Hi to be distinct. 

For any critical region TP for testing Ho against Hi , let 7 (TP [ h) be the proba¬ 
bility of a wrong decision when h is true, i.e. 

iP{W\h) for h(Ho 

y(W i A) = 

[l - P(TP I h) for htHi. 

Suppose now that Ho and Hi are not distinct. Then to any TP a pair ho , h'l 
exist such that 

P(TP I ho) = P(TP I h'l), 

thus 

y(W I /Jo') = 1 - y(W I h'l), 

and therefore 

(1.1) l.u.b. yiW I /i) > i for any TP. 

This property of non-distinct hypotheses leads us to investigate the conditions 
under which 2 hypotheses allow a test where the maximum probability of a 
wrong decision is < §. 

The result, in turn, will enable us to state, for an important class of hypotheses 
a necessary and sufficient condition for 2 composite hypotheses to be distinct. 


2. A lemma. We .shall now prove the following lemma: 

Lemma 2.1. Assume that X has a density function p{x) and let Hi = hi be the 
simple hypothesis that p(x) = Pi(.x), (i = 0, 1). Assume that the set R of x’s 

104 



ON DISTINCT HYPOTHESES 


105 


satisfying po{x) pi{x) has a positive measure. Then there exists a region W 
such that •yiW \ pf) < z = 0 , 1 . 

Proof: Let Ro be defined by Po = pi, Ri by po < pi, Ri by po> pi. Since 
Piix) dx — 1 and pt(x) > 0, {i = 0, 1), Ri and ft® are of positive measure. 
Let 


<t>{x) - 


Pi in fti 
Po in R 2 

[pi = Po in fto. 


Then 


a) 


/ 0 (x) d* > 1 and either 
Js 

/ Pi dx > ^ or b) / podx > ^ 

JR\ + Rn J Ro 


^Ri+Rq 

or both. Assume first a). 

Let ftj C fti 4- fto and such that / pi dx = but / po dx < This 

Jsi JRi 

can be done by including into fts a part of fti of non-zero measure. Let fto C fti 
-1- fto - fto and such that 0 < / pidx < i — po dx. Then 

JSi JSi 

/ podx< pidx < 5 — / Po dx, thus / podx < ^ but / pidx > 

Jh4 Jr* Jrj Jr,+R4 Jri+si 

Assume now b). 

Let ftj C fto and such that / po dx = |. Then / pi dx < 

Jrs Jri 

Let fto C ft 2 — fto and such that 0 < / po dx < § — / pi dx. Then 

jRf Jr, 

I Po dx > ^ and / pi dx < J. 

Jrk+R, JRt+R, 


Thus in case &)W = fto 4 - fto, and in case b) IF = S — fto — fto is a critical 
region for which ‘y{W | p<) < ^ (z = 0, 1). This proves the lemma. 


3. The main theorem. Assume now X to have a density function p(x, 6) 
where d = (di , Oo, • • • , 6k) is an imknown parameter point. Let uo and coi 
be two disjoint, bounded and closed subsets of the Ar-dimensional 0 — space. 
Let K = 0)0 4 - wi and suppose that 6 is knowm to belong to it, which therefore 
will be called the parameter space. Let Hi be the hypothesis that the true 
parameter point is an element of «,•, (z = 0,1). 

We shall consider the problem of testing Ho against Hi. Clearly, P{W | h) 
can now be written as P{W \ 6) and ^(TF | h) as y{W | 6). 

We shall make the following assumptions concerning p(x | 6): 



106 


AGNES BERGER AND ABRAHAM WALD 


Assumption 1. p(x | is continuous in 0, This is of course always fulfilled 
if Q consists only of a finite number of points. 

Assumption 2, For any hounded domain M of the sample space we have 

f [Max p(x j ^)] dx < 00 . 

Jm e 

It follows from Assumptions 1. and 2. that 

(3.1) lim f p(x I da; = 0 

r«oo Js^Sf 

uniformly in 6 where Sr is the sphere in the sample space with center at the 
origin and radius r. 

In what follows, whenever we shall speak of cumulative distribution function 
g($) in the A'-dimensional parameter space, we shall always mean a cumulative 
distribution function satisfying the condition 

f dg (e) = 1. 

Ja 

For any c.d.f. g(6) let W, denote a critical region which contains any sample 
point X satisfying the inequality 

f pix 1 0) dg(.0) > f pix\ 0) dg{0), 

and does not contain a sample point x for which 

f p(x\0)clg(0) < f p(x\0)dg(0). 

It can easily be verified that Wg minimizes the average risk 

(3.2) f yiW\0)dg(0),le., f Wg\0)dg(0) = Min f y(W\0)dg(0). 

Ja Ja w Ja 

Let Hi {i = 0, 1) be the class of all density functions p{x) = I p(x I 0) dg,i0) 

Ja 

where gi(0) is subject to the condition 

f dhi(0) = 1. 

Two density functions p(x) and q(x) are said to be equal if p(x) 9 ^ q(x) holds 
only in a set of measure zero. 

It follows from (3.1) and Assumptions 1. and 2. that 7(17 | 0) is a continuous 
fimction of 6. Let 7(17) denote the maximum of 7(17 1 0) with respect to 0. 
We shall prove the following theorem; 

Theorem 3.1. A necessary and sufficient condition for the existence of a region 
W such that 7(17) < i is that the classes Qc cmd Oi be disjoint 



ox DISTINCT HYPOTHESES 


107 


Proof. Suppose that J2o and Qi are not disjoint. Then there exist two 
distribution functions g(,(d) and gi(d) such that 

f dgM = f dgi(e) = 1 

Jwq Ja)\ 

and 

f p(z 1 e) dga(e) = f p(x 1 6) dgi(,e) 

•'Wl 

(except perhaps for points rc in a set of measure 0). 

Let g{d) - i ^o(^) + i giiO). Clearly, y(W) > f y{W | 0) dg(e) = J for 

Jq 

any W. This proves the necessity of our condition. 

We shall now assume that ik and are disjoint. First we shall show that the 
results of [1] can be applied. On pages 297-8 of [1] there are seven conditions 
listed for the sequential case. For the non-sequential case (the one considered 
here) the conditions 6 and 7 drop out and the first five conditions can be reduced 
to the following conditions: 

Condition 1: The weight function W{6, d) is bounded. 

Condition 2: For any 6, the chance vector X admits a density function p{x | fl). 
Condition 8: For any sequence (f = 1, 2, • • • , ad inf.) there exists a sub¬ 

sequence {0/1 (j == 1, 2, • • •) and a parameter point 0o such that 

lim p(x I 0i,.) = p{x I 0o) 

*—00 

Condition 4: If {9i\ (i = 1,2, • • •) is a sequence of points and 9o a point siu^ that 

lim p(x I di) = p(x I ^o) 

»—30 

then, 

lim Widi, d) = W(eo , d) 

t—ao 

uniformly in d. 

Condition 5: The same as our .^Vssumption 2. 

In our problem d(the decision of the statistician) can take only two values: 
acceptance or rejection of Ho . Condition 1 is evidently fulfilled, since W(0, d) = 0 
if a correct decision is made, and = 1 if a wrong decision is made. Clearly, 
Conditions 2-5 are also fulfilled in our problem. 

A distribution g(0) is said to be least favorable, if it maximizes the minimum 

average risk, i.e., if it maximizes / y(W j 0) dg{0) wdth respect to g. 

J Q 

It follows from Theorems 4.1 and 4.4 of [1] that there exists a least favorable 
distribution. 

Let g*i0) be a least favorable distribution. Then, as has been shown in [1] 
there exists a Wg* such that 



108 


AGNES BEBGEB AND AUBAHAM WALD 


(3.3) Max yiW,. 10)=/’ 1 S) dg*(0). 

Thus, our theorem is proved if we can show that 

(3.4) f y(W,,\d)dg*(e) <h. 

Ja 

Let H* be the hypothesis that the true density is given by 

f p(.x\d)dg*ie) 

Po(x) = - , 

/ me) 

•'wo 

and H* the h 3 ^othesi 8 that the true density is given by 

f Pix 1 0 ) dg*{e) 

Piix) = -. 

/ dg*{e) 

Since flo and Qi are disjoint, po(x) and pi(x) are different density functions. 
Hence, according to Lemma 2.1, there exists a critical region W* for testing Ho 
such that a* < ^ and /S* < where a* is the probability of type I error, and 
j3* is the probability of type II error. Clearly, 

(3.5) ^ > a* f me) + P* [ me) = f yiw*\e) me). 

■ Hence, our theorem is proved. 

It follows from (1.1) that if Ho and Hi are not distinct, Go and are not 
disjoint. 

On the other hand, suppose that f2o and fli are not disjoint and lot 
f p(x 10 ) dgo(e) = f p(a: 10 ) dgi{0). 

JuQ VWJ 

Then for every W 

(3.6) f P{W 10 ) dgo{0) = f P{W 10 ) dgi{0). 

JuQ Jui 

Assume now that w,- is a connected set {i = 0, 1). Then, because of the 
continuity of P{W \ 0) there exist 2 functions 0o(II^), ^i(14^), ^<(W'^) belonging to 
ui{i = 0, 1) such that 

P(W\0oiW))=[ P(W\0)dgo{0) 

•'WO 


and 



ON DISTINCT HYPOTHJBSES 


109 


PiW I diiW)) = f P{W 1 0) dgx{0) 

Jui 

for every TT. Hence, because of (3.6), 

P(W 1 6o(W)) = P(W 1 ei(W)) 

for every W. Thus, we arrive at the following theorem: 

Theorem 3.2. If cji is a connected set (i = 0, 1), then, under the assumptions 
of Theorem 3.1, a necessary and sufficient condition for Ho and Hi to be distinct 
is that the sets ik and he disjoint 

REFERENCE 

[IJ A. WAiiD, *‘Foundations of a general theory of sequential decision functions,” Econo- 
metrica, Vol. 15 (1947), pp. 279-313. « 



AN APPROXIMATION TO THE SAMPLING VARIANCE OF AN ESTI¬ 
MATED MAXIMUM VALUE OF GIVEN FREQUENCY BASED ON FIT 
OF DOUBLY EXPONENTIAL DISTRIBUTION OF MAXIMUM VALUES' 

By Bradford F. Kimball 
N. F. State Department of Public Service 

1. Introduction. Given the doubly exponential distribution of maximum 
values 

(1) F{x) = exp (-e""), y = a{x - u), 

where a and u are unknown parameters, with a prescribed frequency Fo the 
"reduced variate” y is fixed, say at y = j/o • Thus with 

Fo = .99, yo = 4.60015 • • • . 

Given a sample of n maximum values Xi , we are interested in the sampling 
variance of 

(2) X = a) = 6 + yo/a 

due to sampling variations of the estimates H and a. 

H. Fairfield Smith has recently pointed out to me that the examples of applica¬ 
tions of sufficient statistical estimation functions to this problem given in a 
previous paper (see [1, pp. 307-309]) give too large a range for x = giU, a) 
because the sample points (li, «) within the confidence region of the constant 
probability ellipse apply to optimum estimates of (fl, a) rather than to that of 
g = g(il, o). What the problem calls for is the determination of the positions of 
curves g(u, a) and g(u, a) such that the integral of the pdf of the estimation 
functions over all sample values (H, a) which lie between these two curves is 
equal to the confidence level (taken as .95 in previous paper). Further con¬ 
siderations of this being the shortest interval § — g, also come into play. 

As so often happens in research, the previous analysis, although not giving the 
final answer, suggests the next step. If we change our parameters to 

(3) g = g(u, a) = u + yo/a, o' = a 

and are able to carry through the inverse of the maximum likelihood solution 
for fitting of (1) to n sample values x,-, then we shall be in a position to find the 
asymptotic marginal distribution of ■>/nig — g), which will give the answer to our 
problem (see [2]). 

The Jacobian of this transformation of parameters is 

1 yl/a'^ 

d(w, a)/a(y, o') =» = 1, 

0 1 

and hence for a' > 0 no new singularities are introduced. 

^ This involves a correction of a previous paper [1]. 

no 



AN APPBOXIMATION TO A SAMPLING VARIANCE 


111 


2. The equations of the maximum likelihood solution. For a sample of 
size n, the pdj of the sampling distribution in terms of the old parameters is 
given by 

P[m, a, 0„(a:,)] = a" exp [—exp [—Sa(x,- — «)], 
and 

log P = n log a — — a2x,- + now; 

= n[log a — e"‘(2e~“Vw) — ax ow]. 

Now change to the new parameters and use the substitutions: 
z, = e-“‘, z = (22.Vn, zo = e"” = 

Thus 


dzo/dg = -a'zo, dz^/da' = -gza , 


and denoting log P by L we wite 

L = n[log a' — l/zo — a'x + a'g — yo]. 


Hence 


(4) Lg - -na'[zlzo - 1]; 

(5) La' - n[l/a' - d{zlzg)lda' - x + g]. 


3. Derivation of expected values needed. Recall that 
z/zo = e~''»2e""'^*’“'Vn = 2e~“^*‘"“Vn. 

Hence 

(6) d(z/zo)/d«' = 

a(z/zo)/3a = -2(x.- - 

(7) 5*(z/zo)/da'* = e-^’SCx. - 

a^(z/zo)/aa' = 2(x. - 

By investigation of the generating function 

G{t) = P[2(zi/zo)'“'], z< = 

it can be sho^vn that 

^[2g-«(xi-«)/„j ^ 

P[2(x, - = -(l/a)r'(2) = -(1/«)(1 - C), 

where C denotes Euler’s constant, .577216 • • • , and 

P[2(x< - = (l/« )r"(2) = (l/a*)(^V6 + c* - 2C). 



112 


BRADFORD P. KIMBALL 


Hence to find expected values of (6) and (7) we note that 

= -2(x.- - 

= -S(x.- - + (yo/«)Se-“'*‘^Vn, 

and therefore 

(8) E[d(z/zo)/da'] = E[d(z/zo)/da] + {y,/a) E{z/z,). 

Similar analysis shows that 

(9) E[d\z/z^)/dcL'^] = E[d\z/z^)/dct\ + {2y,/a)E[d(z/z,)/da] + (iy\/a^)E[2/z,]. 

4. The inverse of the maximum likelihood solution. It will first be noted 
that the maximum likelihood equations (4) and (5) for determining best estimates 
of g and a' become identical to those for determining best estimates of old 
parameters u and a, when the transformation of parameters (3) is applied to 
them. This is easily verified by applying relations developed above.* 

This means that the best estimates g and a' obtained from (4) and (5) are related 
to the best estimates of old parameters H and a. by 

(10) = d + yo/a, a' = a. 

We now proceed to set up the inverse of the maximum likelihood solution. 
In order to do this we first need the variance-covariance matrix of the direct 
solution. This is (see [2]) 

E[-L„] E[-L,a'] 

E[-L„',] E[-La'a-] 

Now 

= -na'\z/zo), E[-L„] = n<x'\ 

Lga' = -n[z/zo -1-1- a'd{zlzo)lda'\, E{Lga'] = n(l - C -f- ya), 

La a -n[l/a'* - d\z/zo)/da'\ 

E[-La'a'\ = (n/a'*)[7rV6 -f- (1 - C -f- yo)*]. 
Thus the variance-covariance matrix of the estimation functions (4) and (5) is 
na'^ n(l — C -}- 2/o) 

n(l - C -1- j/o) (n/«'*)[vV6 -f (1 - C -h y,f\ 

The asymptotic form of the inverse solution for -s/nig — g) and y/n {a' — a') 
will have the variance-covariance matrix which is the reciprocal of the above 
matrix, multiplied by n. The determinant value of the above matrix reduces to 
niir^/Q). Thus the reciprocal matrix, adjusted by multiplying by n, is 

‘ See equations (5.2) of [1] and note +d(z/z»)/dct in second equation of (5.2) should 
read —d(i/zii)/ba. 



AN APPROXIMATION TO A SAMPLING VARIANCE 


113 


(l/a'^)[l + (1 - C + 2 /o)V(xV 6)] -(1 - C + j/„)/(7rV6) 

-(1-C + 2/o)AV 6) aW/6) 

This gives the solution sought. From the general theory of the maximum 
likelihood solution (see [2]) the distribution of [•\/nig — g), y/nia — a')] is 
asymptotically normal. Hence the marginal distribution of y/nig — g) vM be 
asymptotically normal, and for finite n, the standard deviation may be approximated 



Now the correlation coefficient for the asymptotic bivariate normal distribution 
is seen to be 

r = — (I — C + y^/y/ir^/Q +(! — (? + j/o)^ 

If a' were k nown, w e should have the standard deviation of — g) reduced 
by factor -v/l — r*. This is found to be equal to the reciprocal of the second 
factor in the equation (12). Hence we conclude that if a' be known, the standard 
deviation of (g — g), for finite n, is given approximately by 

(13) aig - g) = 1/iVna'). 

6. An example. Using same example outlined in previous paper (see [1, 
pp. 307-309]), we have n = 57, St' = .01924, 1 - C = .422784, yo = 4.60015. 
This gives <r = 27.826. For 95% confidence interval we take (1.96)(r = 54.54, 
and with ■& = 180.6, 

g = a + yo/a = 419.7, 
and the interval is approximated by 

\g - g \ < 54.5, 

which as an approximation gives the symmetrical interval 

365.2 <g < 474.2. 

Method 4 used in previous paper gave the longer interval (see Introduction) 
which was not symmetrical about g; 

362.8 <g < 507.4. 

REFERENCES 

[1] B. F. Kimbai-l, “Sufficient statistical estimation functions for the pivrameters of the 

distribution of maximum values,” Annals of Math. Stat., Vol. 17 (1946), pp. 
299-309. 

[2] S. S. Wilks, Mathematical Siatislics, Princeton Univ. Press, 1943, p. 139. 




NOTES 

This section is devoted to brief research and expository articles and other short items. 


TESTS OF INDEPENDENCE IN CONTINGENCY TABLES 
AS UNCONDITIONAL TESTS 

By a. M. Mood 
Iowa State College^ 

Summary and introduction. Since the ordinary tests for independence in 
contingency tables use test criteria whose distributions depend on unknown 
parameters, the justification for the tests is usually made either by an appeal to 
asymptotic theory or by interpreting the tests as conditional tests. The latter 
approach employs the conditional distribution of the cell frequencies given the 
marginal totals, and was first described by Fisher [1]. The purpose of the 
present note is to show how these tests may be regarded as unconditional tests 
even though the parameters are unknown by augmenting the test criterion to 
include estimates of the unknown parameters. We present no new tests, 
merely a new setting for the old tests which seems to put them in a little better 
light. 

1. Certain conditional tests. A variate or set of variates x has a probability 
density function/(a:; B) under a null hypothesis involving a parameter or set of 
parameters 0 . When the parameters have a set of sufficient estimators d, the 
joint density function of a random sample of size n may be put in the form 

n 

(1) n f(xi ; 0) = ,^ 2 , ■■■ . I e). 

i-l 

It is assumed that n exceeds the number of parameters. We shall be concerned 
with the class of test criteria which are not functions of the estimators alone. 
Let X(xi, X 2 , • * • , Xn) be a test criterion which may not be put in the form X(d). 
The joint density function for X and obtained by summing (1) for fixed X and 
dy will be of the form 

(2) k(\ 1 §)h0; 0). 

The marginal distribution of X will be denoted by m(X; 0)y the result of summing 
(2) over d for fixed X. 

In order to test the hypothesis in question one would like to divide the X 
space into two regions, an acceptance region Sa and a critical region Sc in such a 
way that Sc would have a prescribed size a under the null hypothesis. One 
would of course set up other specifications to be fulfilled by Sc , but we are 

^ The author is now with The RAND Corporation, Santa Monica, California. 

114 



TESTS OP INDEPENDENCE 


115 


interested here only in the fact that the size of Se cannot be determined because 
of the presence of the unknown parameters 6 in m(\; 0), 

One can set up a conditional test by using the conditional distribution fc(X [ d). 
That is, for fixed the measure of any region R{d) (which is measurable relative 
to fc(X I d), say, in the Lebesgue-Stieltjcs sense) of the X space is known because 
the d are known in any given instance. Thus a conditional test can be made 
with a critical region Rc(6) of prescribed size. 

The conditional test may be interpreted as an unconditional test in the present 
instance in the following manner: the unconditional test is made by using the 
double criterion (X, d). The (X, d) space is divided into two regions, Ta for 
acceptance and To for rejection. The critical region To consists of all points 
(X, d) such that X is contained in Re(0), If the size of Re0) is a for all 0, then 
the size of Tc is also a, for 

f f k(\\ Si)h0-, e) dXdd = I I d) h0; 6) dd 

(3) = / ah(d;e)dd 

= a. 

In this way one can make an unconditional test of the hypothesis with a critical 
region of prescribed size; of course one does not have complete freedom to 
specify the shape of , but he can control it to the extent that RdO) may be 
chosen arbitrarily for every d. Tc is of course a similar region in the sense of 
Neyman and Pearson [2, 3, 4] for the augmented criterion, and the construction 
of Tc is essentially the same as that used by Neyman and Pearson to test param¬ 
eters with sufficient estimators. 


2. Application to contingency tables. As an illustration we shall follow 
Wilks’ [ 5 ] treatment of a two-way table with r rows and c columns; the cell 
frequencies are n,-, and the cell probabilities are p,-,- with 

Una = n; 'EiPa =1; t = 1,2, • • • , r; j = 1,2, ,c. (4) 

The sample is thus regarded as having come from a multinomial population. 
We let 

( 5 ) Pi. = 2 Vi ); V i = 12 Vii ; = 12 : n.i ^J2 

; i 3 t 

The null hypothesis Hq (of independence) corresponds to the subspace for which 

(6) Pij = PiQi 9 2 = 1 = S 

in the parameter space of the p,/. The likelihood ratio criterion for testing Ho is 




116 


A. M. MOOD 


and its distribution depends on the unknown parameters p< and g,-. However 
the parameters have sufficient estimators 


( 8 ) 


■pi = n.-./n, qi = n.j/n 


for the marginal distribution of the n,-. and n. j is 


( 9 ) 


(nni.!)(n«.;!) 


(np,"‘)(ng,"0 


and when this is divided into the distribution of the «<,• (under the null hypothe¬ 
sis) one finds the conditional distribution of the n,-,- to be 


( 10 ) 


g(nu, mi 2 , • • • ,nrc\ni. .Th., • • • , n.c) 


(nni.!)(nn.,!) 

nlUnijl 


which is independent of the parameters. The distribution (10) is just the 
combinatorial distribution used ordinarily in deriving the distribution of X 
for small samples. The test for independence is therefore a conditional test 
which however may be interpreted as an unconditional test if the criterion X is 
augmented by the estimators of the parameters under the null hs^pothesis. 
Instead of the likelihood ratio criterion Karl Pearson’s Chi-square criterion 
could just as well have been used since its conditional distribution is also deter¬ 
mined by (10). 

The usual difficulty due to discreteness arises in this application to contingency 
tables. It is not possible to make the significance level exactly a. In terms 
of the notation of the first section, Re0) cannot be chosen so that it will have 
size exactly equal to a for all One would ordinarily replace the equalities by 
inequalities. The Rc0) would be chosen to have size less than but as close to a 
as possible. The size of Te is then unspecified and one can only state that his 
significance level is less than a. This difficulty is not particularly serious in 
practice unless the test criterion has only one degree of freedom. 


REFERENCES 

[1] R. A. Fishek, Statistical Methods for Research IFor^crs, Oliver and Boyd, London, 

1946, pp. 96, 97. 

[2] J. Netman and E. S. Pearson, “On the problem of the most efficient tests of statistical 

hypotheses,” Roy. Soc. Phil. Trans., Series A, Vol. 231 (1933), p. 289. 

[3] J. Netman and E. S. Pearson, “Sufficient statistics and uniformly most powerful 

tests of statistical hypotheses,” Stat. Res. Memoirs, Vol. 1 (1936), p. 113. 

[4] J. Netman, “Outline of a theory of statistical estimation based on the classical theory 

of probability,” Roy. Soc. Phil. Trans., Series A, Vol. 236 (1937), p. 364. 

[5] S. S. Wilks, Mathematical Statistics, Princeton University Press, 1943, pp. 213-220. 



5 % SIGNIFICANCE LEVELS 


117 


THE 5% SIGNIFICANCE LEVELS FOR SUMS OF SQUARES 
OF RANK DIFFERENCES AND A CORRECTION 


By Edwin G. Olds 
Carnegie Institute of Technology 


About ten years ago this author published a paper [1], containing tables for 
use in testing the significance of the rank correlation coefficient. In a paper on 
non-parametric tests, [2, p. 316] Scheff4 remarks that it would be desirable 
to have these tables extended by inclusion of the 5% values. When the com¬ 
putation was begun it was noted that a necessary formula was given incorrectly. 
The main purpose of this note is to correct the formula and to extend Table V, 
[1, p. 148]. Incidentally, a minor addition for Table III, [1, p. 143] will be 
supplied. 

The formula for the rank correlation coefficient, r\ is given by 




V? — n' 


where n is the number of individuals ranked and Sd* = ^ d\ (di being the raus. 

t—l 

difference for the ith individual). As noted in the original paper, the nuil 
hypothesis, r' = 0, is equivalent to the hypothesis = (n® — ^)/6, and the 
latter hypothesis is slightly more convenient to test. Schefl[4’s remark seems 
to be directed at Table V, which gives, for 11 < n < 30, pairs of values between 
which has a probability, P, of being included. Values are tabled for P = .99, 
.98, .96, .90 and .80. The necessary values for P = .95 are given below and 
can easily be copied in the left-hand margin of the original Table [1, p. 148]. 
These values, as in the previous case, have been calculated by using the fact that 

^ (f — n 


has an approximately normal distribution with a mean of zero and a variance of 
{n — l)[n{n + 1)/12]2. In the original paper, [1, p. 142] the denominator 
in the bracketed part of the variance was printed as 6, instead of 12. 

In this author’s original paper the exact frequencies of sums of squares of 
rank differences were given for m = 2 to n = 7 inclusive, [1, p. 139J. The same 
results, together with the results for m = 8, were obtained (independently) by 
Kendall and others and published some months later, [3, p. 255]. Therefore, 
it is possible to extend slightly the comparison of approximating functions 
given in Table III, [1, p. 143]. Using Kendall s results for n = 8 it is found 
that when the approximations obtained by using a Pearson Type II curve are 
compared with exact results the average and maximum differences of cumulatives 
are .0013 and .0067 respectively. ^\Tien approximations are made by using the 
normal curve the corresponding errors are .0081 and .0163. 



118 


EDWIN G. OLDS 


REFERENCES 

[1] E. G. Olds, “Distribution of the sums of squares of rank differences for small numbers 

of individuals,** Annals of Math. Slat.^ Vol. 9 (1938), pp. 133-148. 

[2] H. Scheff£, “Statistical inference in the non-parametric case,** Annals of Math, Stal.^ 

Vol. 14 (1943), pp. 305-332. 

[3] M. G. Kendall, Sheila F. H. Kendall and B. Babington Smith, “The distribution 

of Spearman*s coefficient of rank correlation in a universe in which all rankings 
occur an equal number of times,** Biomeirika^ Vol. 30 (1939), pp. 251-273. 


TABLE V {Extended) 

Pairs of values between which has a probabilityy P, of being included 


n 

P - 

- .95 

11 

83.6 

356.4 

12 

117.0 

455.0 

13 

158.0 

570.0 

14 

207.7 

702.3 

15 

266.7 

853.3 

16 

335.9 

1024.1 

17 

416.2 

1215.8 

18 

508.4 

1429.6 

19 

613.3 

1666.7 

20 

732.0 

1928.0 

21 

865.1 

2214.9 

22 

1013.5 

2528.5 

23 

1178.2 

2869.8 

24 

1360.0 

3240.0 

25 

1559.8 

3640.2 

26 

1778.4 

4071.6 

27 

2016.7 

4535.3 

28 

2275.7 

5032.3 

29 

2556.2 

5563.8 

30 

2859.0 

6131.0 



NON NEGATIVE QUADRATIC FORMS 


119 


INDEPENDENCE OF NON-NEGATIVE QUADRATIC FORMS IN 
NORMALLY CORRELATED VARIABLES 

By Bertil Mat^rn 

Forest Research Institutey Experimentalfdltety Sweden 

In a recent paper by the author [5] the following theorem has been mentioned 
without proof. Though the theorem is very simple and easy to prove the 
author has not found it elsewhere in the literature. 

Theorem. If two non-negative qiiadraiic forms in normally correlated variables 
with zero means are uncorrelated the two forms are independent 
To prove the theorem, let the two forms be 

n n n n 

(1) Ql = ^ j Q 2 “ bij Xi Xj y 

t-1 J-1 i-1 i-1 

where the are normally correlated and all have mean 0. By a well-known 
theorem on quadratic forms we can reduce Qi and Q 2 to the forms 

n n 

(2) Qi = Hcii/i, <?2 = 2 diz], 

where the ?/t’s and 2 »’s are linear functions of the x*’s. In the 2n-dimensional 
normal distribution of the y/s and the Zi% let p,/ be the covariance of yi and 
Zj . It is then easily shown that the covariance of y] and z) is 2 p]j , and hence 
that 

n n 

(3) cov {Qi ,Q^ = 2 22 Cidi p]j. 

«-i j-i 

As the forms are supposed to be non-negative all coeflBcients in (2) are non¬ 
negative. If Qi and Qi are uncorrelated, each term on the right hand of (3) 
must vanish. Consequently, if Cj 0 and d, 9 ^ 0, we must have p<y = 0. This 
means that all y/s in Qi with non-zero coefficients are independent of all Zy’s in 
Qi with non-zero coefficients. Hence Qi and Qi are independent. Q.E.D. 

To see if Qi and Qi are uncorrelated we need an expression for the covariance 
of the two forms in terms of the coefficients in (1) and the variances and co- 
variances of the original variables Xi. Let A and B be the matrices of the two 
forms (1). Clearly we may suppose A and B to be symmetric. Let the variance- 
covariance matrix of the xy’s be L. By straightforward calculations we find 

(4) cov (Qi, Q 2 ) = 2 Tr ALBL. 

Here we have used Tr M to denote the “trace,” i.e. the sum of the diagonal 
elements in a square matrix M. In case of independent variables with variance 1, 
we get 

(5) cov (Qj ,Qi) = 2 Tr AB. 

The formulae (4) and (5) are given in [5]. 



120 


HERMANN VON SCHELLING 


It is interesting to note the simplification of the independence condition given 
in [2, 3] which is possible when the forms are assumed to be non-negative. It 
may also be of interest to note that the condition for independence given in 
the present theorem is identical with the corresponding condition for two linear 
forms. (In fact, the latter condition has been used in the above proof.) Further 
we observe that if Q 2 is the square of a linear form with mean 0, we get a necessary 
and sufficient condition for independence between a linear form and a non¬ 
negative quadratic form. The corresponding condition when Qi is not supposed 
to be non-negative has been given in [4]. 

As an application consider a quadratic form Q in normally correlated variables. 
Let it be known that Q has a x^-distribution with / degrees of freedom. If 
further 

(6) Q = + Q 2 + * * * + , 

where the Q,’s are non-negative and mutually uncorrelated quadratic forms, 
then each has a x^-distribution with /» degrees of freedom, say, and = /. 
The proof with the aid of the above theorem is almost immediate. We thus 
get another formulation of the theorem of Cochran [1] on the decomposition of a 
quadratic form. 


REFERENCES 

[1] W. G. Cochran, “Distributiou of quadratic forms in a normal system with applications 

to the analysis of covariance,” Proc. Cambr, Phil, Soc.^ Vol. 30 (1934), pp. 173-191. 

[2] A. T. Craig, ‘‘Note on the independence of certain quadratic forms,” Annals of Math, 

Slat., Vol. 14 (1943), pp. 195-197. 

[3] H. Hotelling, “Note on a matric theorem of A. T. Craig,” Annals of Math. Stat., 

Vol. 15 (1944), pp, 427-429. 

[4] M. Kac, “A remark on independence of linear and quadratic forms involving inde¬ 

pendent gaussian variables,” Annals of Math. Stat.y Vol. 16 (1945), pp. 400-101. 

[5] B. Mat^rn, “Metoder att uppskatta noggrannheten vid linje- och provytetaxering” 

(“Methods of estimating the accuracy of line and sample plot surveys”), 
Meddelanden frdn Statens Skogsforskningsinslituty Vol. 36 (1947), pp. 1-138. 


A FORMULA FOR THE PARTIAL SUMS OF SOME 
HYPERGEOMETRIC SERIES 

By Hermann von Schelling 
Naval Medical Research Laboratory, New London, Conn} 

Let an urn contain N balls of Avhich are a black and h white. A single ball 
is drawn. We note its color, return the ball into the urn and add A balls of the 
same color. The probability w{ni) to obtain n\ black balls in n trials is given 
by a formula due to F. Eggenberger and G. P(31ya [1]: 

^ Opinions or conclusions contained in this paper arc those of the author. They are not 
to be construed as necessarily reflecting the views or endorsement of the Navy Department. 



HYPEKGEOMETniC PARTIAL SUMS 


121 


(1) w{n 


■. f n \ a{a 

■' ■ UJ 


+ A) • ■ • [a + (ni — 1)A] -6(6 + A) • • • [6 + (n — ni — 1)A] 
'N{N + ^)- ■ -[N {n - 1)A] 

(n fixed, ni variable). 


Now, we fix wi and ask for the probability that the nith black ball appears at 
the nth drawing. We find 


( n — l\o 

ni - \) 


(a + A) • • • [a + (ni — 1)A] .6(6 + A) ■ • • [6 + (n — ni — 1)A] 
~NW+^)---{N-fin - 1)A] 

(ni fixed, n variable) 


This function is the (n — ni + l)th element of the series 

Consequently, the probability that the nith black ball appears at the latest in the 
nth drawing reads, with an obvious abbreviation, 

n 

W(n) = E w{i) 

i"»n\ 

Now, we assume the nith black ball did not appear in the nth drawing. What 
is the alternative? The (n — ni + l)th white ball must have appeared in the 
nth drawing at latest. The corresponding probability is according to the 
equation (3) 

TF(n) = E 


(4) > Q + l)...[t+(, 




The relation (4) originates from (3) by writing 6 instead of o and (n — ni + 1) 
instead of ni. The alternatives add to certainty: 

W(n) + W{n) = 1. 


(5) 





122 


HEllMANN VON SCHELLING 


Change the notations in the following manner: 

b N 

( 6 ) rii-^a, 7 ; n - Til + I -*v. 


From ( 6 . 1 ) and (6.4) find by addition 


(7) 

From (6.1) and (6.3) 

( 8 ) 

From (6.2) and ( 8 ) 
(9) 


n —► »> + a — 1. 


N 

A 


7 


— a. 


a 

A 



a — 0. 


Formula (5) reads now 

(7 - a — / 3)(7 - a — |3 + 1 )• 


• (7 - /3 - 1) 


( 10 ) 


(7 - a)(7 - a + 1)• • • (7 - 1) 

0(0 + 1) • • • (/3 + V — 1) 


F,(a,0,7\ 1) 


+ 


(7 - «)(7 -« + !)• 


• (7 - a + »> — 1) 

•Pa{y ,7 — 0 — a ,7 — a + y;!) = 1. 


F,(a, 0,7',l) denotes the partial sum of the first v elements of the hypergeometric 
series F(a, 0,y; 1). It is to be mentioned that a is a positive integer necessarily 
as follows from (G.l). Since 


Tf(oo) = 1 


(7 - a - 0)(7 - a - 0 + 1) jy - 0 - 1) 
(7 - «)(7 - a + l)---(7 - 1) 


Foo(a, ^,7;1), 


the relation (10) can be written 


fin F,{a,0,y; 1) Fa{v,y — 0 — a,y — a + v;l) ^ 

F«(a,^, 7:1) F^(y,y - 0 - a,y - a + v;l) 

where v and a are positive integers. 

This result is not interesting from the standpoint of pure mathematics since 
the sum F(a, ]8, 7 ; 1) is known. But the relation is useful for the statisticians. 
In calculating the function W{n) they need a sum of ni elements instead of 
(n — «! + 1 ). If ni is small (and this holds in practical applications), the 
e.xact calculation of W{n) is possible for every n. 


REFERENCES 

[1] F. Egoenberoer and G. P6lta, Zeils. f. angew. Math, und Mech, Vol. 3 (1923), pp. 

27^289. 

[2] H. VON ScHELLiNO, Naturwiss., Vol. 30 (1942), pp. 757-758. 



VARIANCE OF PROPORTIONS OF SAMPLES 


123 


THE VARIANCE OF THE PROPORTIONS OF SAMPLES FALLING 
WITHIN A FIXED INTERVAL FOR A 
NORMAL POPULATION 


By G. a. B\ker 
University of California, Davis 


Suppose that, we have a normal population 


( 1 ) 


y = 


1 

—<^xp 
(T V 27r 


(x — 


and we draw samples of N from this population. Wo \vish to estimate the 
proportion, p, of the population between two fixed limits, m + Xcr and in + iia. 
One way to make this estimate is simply to count the number of observed a;’s 
which fall in this interval. We shall denote this number by n. Then the ratio 


( 2 ) 


n/N 


is an estimate of p. If this is done the variance of p is well known to be 


(3) 


p(l - p) 

N 


The method of estimating p by counting the number in a definite interval is 
nonparametric and requires no assumption of normal or other specified type of 
sampled population for validity. However, if we know that the sampled 
population is normal then we may make use of this knowledge in estimating p 
and possibly obtain an improved estimate. 

Another way to estimate p which makes use of the form of the sampled 
population is to compute 


(4) 


1 ^ 
N 


i Z (*< - xf, 


and hence the integral 

(5) 


£ 


m+Xa 


5\/27r 


dx. 


It is implied in elementary texts that (5) is a better estimate of p than is (2) 
although this point is not discussed. 

It is the purpose of the present note to discuss the variance of the estimate (5) 
and compare this variance with (3). 

Now (5) is a function of the first two moments of the sample and it follows 
from an application of a theorem stated by H. Cramer [1] that (5) is asymptot¬ 
ically normal with mean p and variance given by 



124 


G. A. BAKER 


( 6 ) 


2 

<Tp = 


1 . ..-Jx. 

27riV L 2 ^ 




To compare the relative efficiency of the counting method with (6) in complete 
detail would be somewhat tedious. The referee suggests a brief discussion of 
the cases X = — oo, where we are counting the proportion less than some known 
value, and X = —/x, where a portion out of the middle of the distribution is being 
counted. These cases are of particular practical interest. 

If X = — 00 , then (6) becomes 


(7) 



We choose values of m as indicated below: 


M 

P 

Relative Efficiency of (3) 

-2.3263 

0.01 

0.27 

-1.2816 

0.1 

0.56 

-0.8416 

0.2 

0.66 

-0.5244 

0.3 

0.75 

-0.2533 

0.4 

0.64 

0.0000 

0.5 

0.64 


We get values of the relative efficiency of (3) that are low for small p and some¬ 
what higher for larger values of p. 

If X = — M) then (6) becomes 


C8) 


= 


fie 

1 ^' 


We choose values of fi as indicated below; 


M 

P 

Relative Efficiency of (3) 

1.2816 

0.8 

0.63 

0.8416 

0.6 

0.46 

0.2533 

0.2 

0.12 


We see that the relative efficiency of (3) ranges from close to 0.75 to rather small 
values. 

Other choices of X and fi yield relative efficiencies of about the same order of 
magnitude as those illustrated. 


REFERENCE 

[1] Harald Cramer, **Mathematical Methods of Statistics,** Princeton University Press, 
1946. section 28.4, pp. 36(^367. 




BISERIAL COEFFICIENT OF CORRELATION 


125 


THE POINT BISERIAL COEFFICIENT OF CORRELATION 


By Joseph Lev 

New York State Department of Civil Service 


The product moment coefficient of correlation between a continuous variate y 
and a variate x which takes the values 1 and 0 only, is known in psychological 
statistics as the point biserial coefficient of correlation. Let Vi ^i = 1, * • • , n, 
be observations on ?/; , i = 1, • • • , rii, be y values which are paired with the 

value a: = 1; yoi, t = 1, • • • , no, be values paired with x = 0; y, yi, and yo be 
the corresponding means; and n = ni + 7io. Then the point biserial coefficient 
of correlation may be written 



The distribution of r is readily obtained when the y., i = 1, ••• ,n, are 
distributed as 

® V2,.V i - 

where 

i = 1, 2, ••• ,ni, 


i = ni + 1, ni + 2, • • •, n, 

( 7 ^ is the variance of the y» about the common mean a, and p is the parameter 
which represents the correlation between the y, and the Xi . It is easy to verify 
that the statistic in (1) is a maximum likelihood estimate of p. 

It will be convenient to express the two population means in (2) as pi and po 


so that 



(3) 

= a + pa 

\/f.’ 

po = a — pa 


Hence 



(4) 

. Ai no Ml “ Po 
= y n .r ■ 




126 


JOSEPH LEV 


Now write 


(5) 


-J/^ (5-- - 2 _ 

Lt-0 /-I J 


where r is obtained from (1). 

Using (5) we may write t as 

(9i - yo) - (ah — IM>) 


A*i — Ato 


t = 


i / ” _ ff _ p 2 x!~~ O' Vl — 

y wino y nirto _ 


• 1 “li 

Z Z) (y<j - 

i-O i-l _ 

n — 2 


<r Vl - P* 

Tiierefore < has non-ccntral t distribution [1] with 



The methods and tables given in [1] may be used to calculate tests of significance 
and confidence limits for p. 

"WTien p = 0, t has Student’s distribution, and the statistic t = — 

Vl — r- may be used to test the hypothesis, p =» 0, by means of the t tables 
with n — 2 degrees of freedom. The non-central t distribution then determines 
the power function of this test. 

Table IV of [1] can be used to calculate confidence limits for p. If the con¬ 
fidence interval is to be based on equal tails of the distribution choose a confidence 
coefficient 1 — 2«. Then compute 5(/, U t «) and 5(f, to, I — f), where/ = n — 2 , 
and to = Vw ~ 2r/\/l — 

A lower limit for p is given by 

S(/t to, t) 

[n + S^{f, to, e)]l ’ 

and an upper limit by 

6(f,to,l - 0 

[n + 5U<o.l -e)l*‘ 


REFERENCE 

[1] N. L. Johnson and B. L. Welch, “Applications of non-central /-distribution/* Bto- 
meirtka, Vol. 31 (1940), pp. 362-389. 





MEAN DEVIATION 


127 


A NOTE ON KAC’S DERIVATION OF THE DISTRIBUTION OF THE 

MEAN DEVIATION 


By H. J. Godwin 


University College of Swamea, [Vales 

In a paper on a general class of estimates of deviations, Kac [3] obtained an 
expression for the frequency function of the estimate of mean deviation from 
the mean in normal samples. He was unable to e.stablish the identity of this 
with an expre.ssion obtained earlier by me [1]. I now shew that the two results 
are, in fact, equivalent. 

Kac uses the functions defined as the k — fold convolution of 


f(a:) = 



X < 0; 
X > 0. 


I used the functions (7 /l(x) defined by the recurrence relation 


(1) Go(x) = 1, G,(x) = r dt 

Jo 

Now I have shewn elsewhere [2] that the integral of + ■+''*’ taken through 
the interior of a regular simplex in k dimensions, with its centroid at the origin 
and of side a,isy/k + 1 Giia/\^). The relation (1) corresponds to a dissection 
of the simplex into sections, which are (k — l)-dimensionai simplexes, by joining 
the centroid to the vertices and taking sections parallel to the base of each of the 
(fc + 1) smaller simplexes so formed. If however we take sections parallel 
to a base of the whole simplex we get another recurrence relation, viz. 

(2) G, (x) = r dt . 

Jo 


Now (2) may be re-written 




n* 


/** -(n»(x-<)»/2) Gk^i{nt)e ‘ 

Jo " -- 


whence, by induction, Gk-i(nx)-e = n and the equivalence 

of Kac’s result to mine is established. 


REFERENCES 

[1] H. J. Godwin, *‘On the distribution of the estimate of mean deviation obtained from 

samples from a normal population,’’ Biometrikaf Vol. 33 (1945), pp. 254-256. 

[2] H. J. Godwin, “A further note on the mean deviation,” Biometrikaf Vol. 35 (in the 

press). 

[3] M. Kac, ”On the characteristic functions of the distributions of estimates of various 

deviations in samples from a normal population,” Annals of Math, Stat., Vol. 19 
(1948), pp. 257-261. 



128 


A. M. PEI8EB 


CORRECTION TO “ASYMPTOTIC FORMULAS FOR SIGNIFICANCE 
LEVELS OF CERTAIN DISTRIBUTIONS” 

By a. M. Peiser 
New York City 

Professor Henry Scheff4 has recently pointed out to me an error in my paper 
“As 5 'mptotic formulas for significance levels of certain distributions,” which 
appeared in Annals of Malh. Stat., Vol. 14 (1943), pp. 56-62. In the determina¬ 
tion of the significance levels of Student’s t distribution, appeal was made to a 
theorem of Cram4r which requires independent random variables. The variables 
defined at the top of page 61, however, cannot be taken as independent, so that 
the theorem does not apply. 

The asymptotic formula (following the notation of the paper) 



where 

^(yp) = 1 - P, ^(a:) = dv, 


is nevertheless correct. This may be shown directly from the distribution 
function 






p\-(»+l)/2 


dt. 


■\/m r(^n) 

Writing 

(l + Q = exp [- ^ log (l + 0] 

- exp [-(i - , |»|<1, 

and using Stirling’s formula, it follows that G„ix) can be written in the form 


Gn(x) = ^ + 


Jo L 


, t* - 2e - 1 
1+ —n— + 


4« 


3 Qn(t) 

th 


dt 


= #(a:,) 


3? A-X 

4nV2ir ^ 


+ 


n 


lo 


<V2 


dt, 


where Q„(t) is a bounded function of t and n in each finite interval. 

Let tp,n = Pp + On, where On = o(l). Then G„(<p,„) = $(pp) = 1 - p, and 
we have 



CORRECTION 


129 


+ «») “ ^iVp) (Vp + (V p + °«) .-i(i/.+«n)* I 

= 4V2^ ® V/ 


so that 


lim nan 

n-*oo 


y\ + Vp 
4 


This is the required result. 



ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Seattle Meeting of the Institute on 
November 26-27, 1948) 

1. Estimation of the Variance of the Bivariate Normal Distribution. Harry 
M. Hughes, University of California, Berkeley. 

Let Xi and be two random variables normally distributed with known means mi and mz , 
and with common unk nown variance g*. Consider an experiment in which the observed 
variable is F = a/Cxi — mi)^ + (^^2 — m 2 )*. This paper considers the problem of estimating 
the parameter <r when the observations are grouped. By the method of minimum reduced 
chi-square with linear restrictions, two best asymptotically normal estimates are derived. 
By minimization of the asymptotic variance of these estimates, the optimum choice of 
grouping is found as a function of <r. For two and for three groups, when it is known or 
assumed a priori that <r is on a certain finite interval, the optimum grouping is derived 
which will minimize the maximum asymptotic variance on that interval. If such interval 
is moderately small, it is shown that the optimum grouping is the same as if o- were known to 
have the value at the upper end of the interval. Finally the effect of using non-optimum 
grouping is analyzed. * 

2. Derivation of a Broad Class of Consistent Estimates. R. C. Davis, Inyo- 
kern, California. 

Given a chance vextor X with cumulative distribution function F(X, 6), where 0 is an 
unknown parameter vector, a broad class of estimates of 0 is derived having the following 
properties; a) any estimate in this class is a consistent estimate of b) any estimate is a 
symmetric function of independent observations of the chance vector Z. The novel feature 
of this class is that no assumptions about the existence of various partial derivatives of a 
density function with respect to $ are made. As a matter of fact not even the existence of 
a density function is required, and it is postulated merely that a continuous function of X 
for each B (in a certain neighborhood of the true ^ 0 ) and of 5 for each X exist which satisfies 
a Lipschitz condition in B, For each such function having a finite first moment an estimate 
of B is constructed which has the properties a) and b) listed above. 

3. Locally Best Unbiased Estimates. Edward W. Barankin, University of 
California, Berkeley. 

Let p = lpd(ic);t? €0} be a family of probability densities in the space 0 of points x\ 
and g a function on 6. Let « be fixed and >1; call an unbiased estimate of g best at do if its 
s-th absolute central moment (s.a.c.m.) under is (finite and) not greater than the 
s.a.c.m., under pi>o, of any unbiased estimate of g. With a certain integrability postulate 
on the pt?*s, a necessary and sufficient condition, of finite character, is established for the 
existence of an unbiased estimate of g having a finite s.a.c.m. under p,jo- When such a one 
exists, there then exists a unique unbiased estimate which is best at t^o • The existence 
condition defines the s.a.c.m. of the best estimate explicitly as the l.u.b. of a set of numbers; 
in particular, we obtain immediately a generalization of the Cram4r-Rao inequality. Also, 
when it exists, the best unbiased estimate is explicitly constructed from the function g 
and the densities pe . The case « = 2 is studied more closely. Also, a detailed example 
is considered. 


130 



ABSTRACTS OF PAPERS 


131 


4 . Some Problems Related to the Distribution of a Random Number of Random 
Variables. J^dwakd Paulson, University of Washinj 2 ;ton, Seattle. 


Let {a;*) (t = 1,2,3 • • * ) be a set of independent random variables with identical distribu¬ 
tions, with = a&nda^ix) = 6(0 < 6 < «>). LctiV be a positive integral-valued random 
variable with distribution Fx{N) depending on a parameter X, where F(N) = Ax , and 
<r^{N) = B\ (0 < Bx < 00 ), Now let Tj^ = Xi + ^2 + ‘ • The limiting distribution 


as X -♦ 00 of has been investigated by Robbins (Proc. Nat. Acad. Science, 

y/aVh -f 6Ax 

Vol. 34 (A})ril 1948), pp. 162-163) for several different sets of conditions on the distribu¬ 
tion of N. It can be shown that analogous results will hold if instead of Ts we consider 
a more general statistic T* , whose conditional distribution with respect to the variable N 
is such that there exist constants Ui and h\ so that 


lim E 

N-*oo 




= h(t) 


uniformly in any finite I interval, where h{t) is a characteristic function. Returning to the 
statistic 'In , it can be shown that there exists an asymptotic expansion in powers of X“i 


with remainder 0(X 


-(k/2) »n 


Tn — aAx 


) for ~ < when the following conditions arc 

^ Va^Bx + 6Ax - J . 


satisfied: (1) the distribution function of x has a non-zero absolutely continuous compo¬ 
nent, (2) Ei\ X \^) < 00 , and (3) X -> «> through integral values, and Fx(N) is the Xth con¬ 
volution of a random variable n such that < 00 . 


5 . Asymptotic Expansions for the Distribution of Certain Likelihood Ratio 
Statistics. Albert IT. Bowker, Stanford University. 

Asymptotic expansions of the ‘‘Crameriaii’* type are derived for the distribution of 
likelihood ratio statistics given by Wilks for testing various hypotheses about means, 
variances, and covariances of a normal multivariate distribution. The point of departure 
is Wilks’ result that minus twice the logarithm of the likelihood ratio has thex* distribution; 

terms in y* ^2 ’ ’ expressed in terms of the x* distribution. In addition, 

asymptotic expansions of the “Fisher-Cornish” type are obtained for the percentage points 
and for a transformation of the statistic to ax^ variate. 

(). On a Problem of Confounding in Sjrmmetrical Factorial Design. Esther 
Seiden, University of California, Berkeley. 

Let s) be the maximum number of factors that is possible to accommodate in 

symmetrical factorial experiment in which each factor is at s = p" levels (p being any 
positive prime number, n a positive integer) and each block is of size s'", without confound¬ 
ing any degrees of freedom belonging to any interaction involving 3 or lesser number of 
factors. 

R. C. Bose proved in a paper** Mathematical theory of factorial design,” Sankhyd, Vol. 8 
(1947), pp. 107--166, that the following inequality holds: 

fi* + 1 ^ W8(4, s) ^ 2 . This gives in case s - 4, 17 ^ ms(4, 4) ^ 22. It is now 

proved that ^3(4, 4) = 17. 

The proof consists in showing that the maximum number of non three collinear points 
w hich can be chosen in a finite projective space FG(3, 4) cannot exceed 17, w hich according 
to a proof of R. C. Bose is equivalent to the staement that mz (4, 4) cannot exceed 17. 



132 


ABSTRACTS OF PAPERS 


7 . Some Boimded Significance Level Tests of Whether the Largest Observa¬ 
tions of a Set are Too Small {Preliminary Report) John E. Walsh, Santa 
Monica, California. 

A set of n observations are given which satisfy: (1) They are independent and from 
continuous symmetrical populations; (2) The r largest observations arc from populations 
with median 0 while the remaining observations arc from populations with median It 
is required to test whether $ < <p. Let x{l)y • • • , x(n) denote the observations arranged 
in increasing order of magnitude. For r = 1 tests of the form: Accept 0 < ip if x{n) < 
2x(wa) — x{i), where a = Pr[a;(n) -f- x{i) <2^1^ = ^] and iVa is the smallest integer 
satisfying Pr[x(tCtt) > B \ 0 — ip] Ka, can be obtained from n > 15. P^xact significance levels 
can be obtained by assuming a sample from a specified population (e.g. normal). On the 
basis of (l)-(2) alone, the significance level never exceeds 2a. P'or large n, tests can be ob¬ 
tained for any r if the observations satisfy the additional weak condition: (.3) The tail order 
statistics arc approximately independent of the central order statistics; also the variances 
of the tail order statistics arc either very largo or very small compared with the vari¬ 
ances of the central order statistics. The test is: Accept 0 <(p if max[x(ft) -f- j(n — jk); 
1 ® < 2x(wa)t where L* < L*+i tj>' yjn - r -- l,Wa is the smallest integer sat¬ 
isfying Vr[x{w a) > 0 \ 0 = <p] ^ and a = Pr{max[a:(u) -f x{n — jk)] <2 0\0 (p\. For 

large n the significance level is approximately a but is < 2a for all n. The power function 
—♦1 asv? — ^ and —>0asv’-~ d— 


8 . Determination of Optimal Test Length to Maximize the Multiple Correla¬ 
tion. Paul Horst, University of Washington, Seattle. 

If the lengths of the tests in a battery are altered, their intercorrelations and their validi¬ 
ties or correlations with a criterion are also altered. Consequently, the multiple correlation 
of the battery with the criterion will also be altered. These changes arc a function of the 
reliabilities of the tests. Suppose we have given from a set of experimental data (1) the 
time allowed for each test in the battery, (2) the reliability of each test, (3) the intercorrela- 
‘tions, and (4) the validities of all the tests. If we specify the overall testing time we are 
willing to allow for the test in the future, we can determine the amount by which each test 
must be altered in order to give the maximum multiple correlation with the criterion. The 
method, together with numerical examples and the mathematical proof, is presented. 

9 . Some Numerical Comparisons of a Non-Parametric Test with other Tests. 

F. J. Massey, University of Oregon, Eugene. 

Let F(x) be the cumulative distribution function of a R.V.A', and let Xi < X 2 • • • < Xn 
be the results of n independent observations ordered as to size. 

Define Sn{x) = 0 if rc < a;i; 

k . 

- -\i Xk < X < Xk + \ \ 
n 

^ 1 if Xn < X. 

To test the hypothesis //o : F(x) = Fo(x), where Fq{x) is completely specified, use the 

criterion: reject Ihii max I Sn(x) — Fq(x) I > (yhoice of X determines the first kind 

Vn 

of error. The second kind of error against specified alternatives can be calculated 
numerically. 



ABSTRACTS OF PAPERS 


133 


10. On the Deviation of Extreme Values. W. J. Dixon, University of Oregon, 
Eugene. 


Let x{i) be the zth observation in order of magnitude in a sample of size n. The distri- 
x{n) — x{2) 

bution of /d = —-; is obtained explicitly for samples from a rectangular distribution 

x\n) — a;(l) 

and for w = 3, 4, 5, for samples from a normal distribution. Percentage values of li for 
values of n up to 30 are presented, (leneralizations of li are indicated. 


11. The Optimum Size of Interval for Making Measurements of a Rocket’s 
Angular Velocity. Edward A. Fay, University of California, Berkeley. 

Over a given range of time 0 < t < 7’, the angular velocity of a rocket’s spin is adequately 
represented by a polynomial ^(t) of given degree .s — 1 but with unknown coeflicients. 
The rocket’s angular acceleration and the angle through which it spins in a given time- 
interval may then be obtained respectively by differentiating and int(‘grating ^(r). Let 
V be an integer > **, let I = T/v, and let be the angle through which the rocket turns in the 
interval (/ — 1)< < r < it. While {(t) and cannot be directly observed, the angles 
Vi i li i ’" ) Vy can. Let be an observation on rj^ , and assume that }"i , r 2 , • • • , Yf 
are indei)endent homoscedastic variables. The Ft may then })e combined by the method 
of least squares to obtain best linear estimates A'(t, t) and A'(t, t) of ^(r) and ^'(r). The 
choice of t is at the observer’s disposal. For the cases s = 2, 3, 4, and for the cases that the 
common variance of the F, is (a) independent of t or (b) proportional to ty an expression is 
derived for the variance of A''(r, and the maximum value of that variance over the 
range of r is minimized with respect to t. The method is of much more general application. 


12. Stationary Time Series Analysis and Common Stock Price Forecasting. 

Zenon Szatrowiski, University of Oregon, JOugtTie. 

The objective of this j)aper is to present a statistical procedure of practical value in the 
problem of extracting information from the past l)ehaviour of economic time scries, informa¬ 
tion to be used in projecting future patterns. The author feels that his approach yields 
results closer to reality than the techniques described by Herman Wold, M. C. Kendall, 
H. T. Davis, and in particular, the technique of “disturbed harmonics’’ used by G. U. Yule. 
The idea of the proposed technique can be described by examining the autoregression 
scheme, which seems to be considered the most desirable by the above men. A simple 
example of such a scheme is the equation 

iq+2 = -aut+i “ hut -f Et+2y 

where the w’s are the time series values and E*8 are random elements. The above linear 
relationship, when determined cither directly or through an empirical correlogram (for 
which data is usually inadequate) is a kind of an average relationship. It may be as inap¬ 
propriate in estimating future values of a time series as would be an average in estimating 
the level of a series with a pronounced trend. 

The author proposes using derived time series to shed light on the nature of the changes 
in the parameters under consideration. Such derived scries could be estimates of the a’s 
and 6’s for successive time periods. The author has found that projections of common 
stock price fluctuations were improved considerably when the changing nature of the 
cyclical pattern was taken into account. This was done by constructing derived time 
series, “moving” estimates of the amplitude, period and phase of the dominant harmonics. 




134 


ABSTRACTS OF PAPERS 


The author points out that the above approach has shown promise in commodity prices 
as W'ell as common stocks. The value of this approach in forecasting lies in the facts that 
(1) it does not require forecasts of other scries and (2) it is based on the realistic assumption 
that history repeats itself but with variations, variations which may be taken into account 
through appropriate models. 

13. Distribution of the Number of Schools of Fish Caught Per Boat. J. Ney- 
MAN, University of California, Berkeley. 

Let X be the average number of schools of fish per unit area of a fishing ground A. Let a 
be any area partial to A, and let a, X) denote the probability that exactly n schools of 
fish will be found within a. At time < = 0 a boat begins scouting for fish in A traveling at 
constant speed v. It is assumed that all schools of fish within distance r of the boat arc 
detected and none is detected at a greater distance. If s ^ 1 schools are detected then they 
are caught in turn, the catching of one school taking up exactly h hours. X{t) denotes the 
random variable representing, for each < ^ 0, the number of schools caught up to time t 
including the one which may be in the process of being caught at the moment t. Probability 
distribution of A"(f) is given by the formula 

k 

P\X{i) < A;1 = S 0[m, 2rv{t - kh), X] 

m—0 

for A; = 0,1, 2, • • • , n — 1, where w — 1 is the greatest integer smaller than t/h. Of course 
P\X{t) < n} = 1. This result is easily generalized for the joint distribution of catches of 
several boats fishing in the same area so that their paths do not cross. Assuming specific 
functions to represent a, X) formulae may be obtained to estimate the parameters X 
and rv. 

14. Some Problems in Fishery Research to which Statistical Methods are 
Applicable (Preliminary Report). Ralph P. Silliman, U. S. Fish and 
Wildlife Service, Seattle. 

One of the most difficult problems is the obtaining of a random sample of a fish popula* 
tion. Rarely are such populations randomly distributed over any area, and the samples 
must often be taken from the catches of fishing vessels which do not uniformly cover even a 
part of the area of distribution of the population. Many distributions of variables found in 
fishery research are not normal, and statistical methods based on the normal distribution 
can be applied only through the use of unsatisfactory transformations. Since fishery 
research is largely observational in technique, data reflecting the concurrent effect of 
several variables are usually obtained. Although the present methods of multiple correla¬ 
tion and regression can be used in some instances to measure the relative effect of the 
separate variables, there are many situations in which these methods do not yield good 
results. Finally, many data used in fishery research must be adjusted before use, and 
existing methods do not give good measures of the expected variability of such adjusted 
data. Examples of specific problems are found in the distribution of deliveries and the 
variations in catch of Columbia River chinook salmon. 

15. The Application of the Hypergeometric Distribution to Problems of Esti¬ 
mating and Comparing Zoological Population Sizes. Douglas Chapman, 
University of California, Berkeley. 

Estimates and tests of the type, as developed by Neyman, are adapted to sampling 
without replacement from a finite population. These results are applied to problems of 



ABSTRACTS OF PAPERS 


135 


estimation and comparison of zoological population sizes as determined by sampling pro¬ 
cedures. For single samples the bias and variance of different estimators is compared. 
Finally some numerical calculations are made for various population and sample sizes to 
determine how different sample sizes and different methods of analysis affect the size of the 
critical region which is necessarily an approximation to the desired size. For some of 
these the power of the test is considered. 


16. Extension to Multivariate Case of Neyman’s Smooth Test with Astronomical 
Application. Elizabeth L. Scott, University of California, Berkeley. 

It is more or less generally accepted that the distribution of extra-galactic nebulae in 
space is not uniform in the small. In particular, counts in small cubes show distinct signs 
of contagion. On the other hand, it is not settled whether or not lack of uniformity in the 
large exists. One way of making this statement precise is to assert that the power series 
expansion of the logarithm of the probability density of the two angular coordinates of the 
nebulae within a given large area on the unit sphere does not contain low order terms. 
In fact, any such low order terms could be interpreted as determining “trends” or what 
could be described as lack of uniformity in the large. From this point of view, uniformity 
in the large may be tested by a two dimensional Neyman Smooth Test for goodness of fit. 

Let y)| be a sequence of polynomials in x and y ortho-normal for \ x \ < a and 

! y I < 6. If x* and y* are the coordinates of the ^th out of n nebulae counted within the 
rectangle ( — a, a), (—6,6) then the smooth test of mth order consists of rejecting the hypoth¬ 


esis of uniformity in the large w^hen 
value of X* with m(m + 3)/2 degrees of freedom. 


S (s , 2 /*) ) ^ nx5 where xj is the tabled 

» + \it-l / 


17. A Mathematical Theory of Vitamin A Metabolism in Fish {Preliminary 
Report). Normal E. Cooke, Vancouver, B.C. 

Several possible hypotheses for vitamin A metabolism in fish are developed from simple 
postulates. These hypotheses are tested (by least squares method) against experimental 
data in an attempt to deduce the correct mechanism. 


18. The Interactance Hypothesis between Populations. Stuart C. Dodd, 
University of Washington, Seattle. 

The hypothesis of interacting between human populations, or of demographic gravita¬ 
tion, is that the number of interactions between two communities (or other groups) tends 
to vary directly with the product of the two populations and their “specific coefficients” 
and the overall duration and tends to vary inversely with the intervening distance and the 
average duration of an interact. The hypothesis is tested by isolating factors and measur¬ 
ing their correlation with the amount of interacting in the pairs of a set of N communities. 

This hypothesis is supported by studies of telephoning; news circulating; travel by bus, 
train, or plane; R. R. express; college attandance; intermarrying; etc. Further lists of 
interhuman actions are suggested for investigation. 

A new corroborating bit of data comes from a poll by the Washington Public Opinion 
Laboratory in a Seattle housing project where negro-white relations threatened violence. 
The tension units of verbal interaction (defined as one anti-negro opinion asserted by one 
white person) were observed to decrease inversely with a power of the distance from a rape 
site. The observed tension correlated with the formulas or curves predicting that tension 
at p — .94 and passed the chi-square test at the one per cent level. The tension is dimen¬ 
sionally analyzed as a social force and social energy. 



136 


ABSTRACTS OP PAPERS 


19. The Employment of Marked Members in the Estimation of Animal Popula¬ 
tions. Milner B. Schaefer, U. S. Fish and Wildlife Service, Honolulu, 
T. H. 


The estimation of population numbers by marked members is an important technique in 
fisheries research. The number N of individuals in the population, of which T are known 
to be marked, may be estimated from a sample of n of which t are found to be marked. 

nT 

Several estimates are available, all of which reduce to iV =» — when the numbers are all 

’ i 


large, but more precise formulae should be used when the numbers are not all large. An 
estimate of the variance of N has been derived by Karl Pearson {Biometrika, Vol. 20 (1928), 
pp. 149-174) on the basis of inverse probabilities. The sampling error may also be measured 
by means of confidence intervals. Formulae have been developed for estimating N from 
repeated samples of the same population, but no very suitable estimates of the sampling 
error are available in this case. For some migratory fishes marked at a point on their 
migration path and sampled later at another point, there e.xists a correlation between time 
of marking and time of recovery in the subsequent samples. In such case, the total number 
of fish marked or drawn in the subsequent samples cannot in general be regarded as random 
samples of the population. Where numbered tags are employed as marks, so the fish may 
be individually identified both when marked and recovered, a method of estimating N in 
this case also is suggested. 


20. Non-Response and Repeated Call-Backs in Sampling Surveys. Z. W. 

Birnbaum and Monroe g. Sirken, University of Washington, Seattle. 

In opinion-polls and other sampling surveys, a response can only be obtained from those 
individuals of a sample who are available for interviewing. Let p \ be the probability that 
an individual chosen at random from the population answers “yes” to a question, pi. that 
an individual is available for interviewing, and pn that an individual is available and 
answers “yes.” Usually one wishes to estimate the parameter p.i , but from a sample it 

Pii 

is only possible to estimate - = p' = the probability that an individual answers “yes” 

Pi. 

if he is available. Thus the total error in estimating p.i from a sample contains two com¬ 
ponents: the bias p.i ~ p' and the sampling error. In this paper a technique is presented 
in which individuals not available at a call are called upon repeatedly, up to k times. It 
is shown how, for a given upper bound of the total error at a prescribed probability level 
and a given A:, it is possible to minimize the cost of the survey by optimizing the relationship 
between the greatest possible bias and the sampling error. 


(Abstracts of papers presented at the Cleveland Meeting of the Institute on December 

27-30, 1948.) 


21. A Necessary Condition for a Certain Class of Characteristic Functions 

{Preliminary report). Eugene Lukacs, 'NOTS, Inyokern, California and 
Our Lady of Cincinnati College, Cincinnati, Ohio. 


Let^(0 


(■4)(-a-(‘-.or 


be the reciprocal of a polynomial without 


multiple roots. The following necessary condition is derived which ^(0 has to satisfy in 
order to be the Fourier transform (characteristic function) of a distribution. 



ABSTRACTS OP PAPERS 


137 


If is the Fourier transform of a distribution, then 

1) ip{t) has no real roots. If 6 -f ia (a ^ 0, 6 0) is a root then — 6 -f ta is also a root. 

That is the roots of (p{t) are either located on the imaginary axis or are symmetrical to this 
axis. 

2) If 6 + ia (a 4= 0) is a root then there exists also at least one root ia so that sign a *= 
sign a and I a I ^ I a i . 

As a particular case one obtains the well known fact that (1 + cannot be a character¬ 
istic function. 

22. Precision of Estimates from Samples Selected under Marginal Restrictions. 

{Preliminary Report). Clifford J. Maloney, Camp Detrick, Frederick, 
Maryland. 

Formulas are derived for estimates and for their variances computed from samples drawn 
at random subject only to marginal restrictions from populations classified by several 
characters, and estimates are made of the efficiency of such sampling plans compared to 
sampling with complete stratification or sampling completely at random. By means of two 
simple but general theorems it is shown that the variances arc independent of the individual 
values of the character being sampled for in the population and in the sample and depend 
only on the first two moments for each cell of the population. It is shown that in the large 
sample approximation a practical scheme for actually drawing such samples can be obtained 
by drawing a sample of size n entirely at random and using the results of Deming and 
Stephan {Annals of Math. Slnt.^ Vol. 11 (1940), p. 427) to adjust the sample marginal totals 
to the specified values. Deficient cells will of course be filled up by additional drawings. 
A measure is given of the relative loss of information in sampling with marginal restrictions 
on the sample cell numbers compared to sampling with complete stratification. If 
represents the population mean in the ijih cell, r, the population mean in the ith row and 
C) the population mean in the^tli column, and if a,, is of the form Ot, = a -h r, -f Cj , then 
marginally restricted sampling is as efficient as sampling with complete stratification. For 
arbitrary a measure of the relative efficiency compared to sampling completely at ran¬ 
dom is given by the relative degrees of freedom for the sample cell numbers. A compari¬ 
son with other possible sampling procedures is given. 

23. Properties of Maximtun- and Quasi-Maximum Likelihood Estimates of 
Parameters of a System of Linear Stochastic Difference Equations with 
Serially Correlated Disturbances {Preliminary Report). Herman Rubin, 
Cowles C-ommission, The University of Chicago. 

Let Auix'i — wj be a complete system of linear stochastic difference equations, Xt =* 
{yt > yt jointly dependent, Zt predetermined. Let us suppose -|- Bxuu[^i = v\ , where 
the random vectores Vt are serially independent and have mean zero. If the vectors vt 
have the same Gaussian distribution, and the system is identified, we can obtain maximum- 
likelihood estimates; if the distributions are not identical Gaussian, quasi-maximum-likeli¬ 
hood estimates result. The identification problem is a special case of that with independent 
Ui and bilinear restrictions on some A*, , if the restrictions on A*, are linear or bilinear. 
As in that case, we may have multiple identification. However, the special aspects of this 
type of system yield some help in the discussion of the identification problem. We also 
observe that if the system is identified, we obtain consistency and asymptotic normality 
of the estimates under the same conditions as with serially independent w’s for Aux . 

24. The Computation of Maximum Likelihood Estimates of Parameters of a 
System of Linear Stochastic Difference Equations with Serially Correlated 



138 


ABSTRACTS OF PAPERS 


Disturbances. Herman Chernopf, Cowles Commission, The University 
of Chicago. 

Consider the structural equations Auxx’t =» xi\ where the vector xi - [yt , yt are the 
jointly dependent, and zt the predetermined variables and where ut are serially correlated. 
In particular assume that the disturbances ut satisfy the simple Markoff Process 
» v\ where vt is a stationary serially uncorrelated Gaussian Process with zero 
mean. Then we have AuxXi + BvuAuxx[^i « v[ . The estimates of Bvu and E\v[vi \ can be 
simply expressed in terms of those of Aux . It is shown that iterative gradient methods of 
maximization require about 2 to 3 times as much work per iteration as in the serially un> 
correlated case. To apply the Newton Method about 8 times as much work per iteration 
is required. The Newton Method uses the second order terms of the expansion of the log 
of the likelihood in terms of the independent parameters of Aux and these can bo used to 
obtain estimates of the asymptotic covariance of the estimates. 


25. Test Criteria for H]rpotheses of S]rmmetry and Definiteness of a Regression 
Matrix for Demand Functions. Uttam Chand, University of North 
Carolina. 

The importance of relations between two sets of variates (e.g. the study of relations of 
the prices to the quantities of several commodities) invariant under linear transformations 
of one set of variates contragredient to those of the other was first pointed out by Hotelling* 
In the study of related demand functions no suitable statistical tests have existed for 
testing the hypotheses of symmetry and negative definiteness of the regression matrix of 
prices on quantities. The test proposed here for the hypothesis of symmetry is exact and 
invariant under all contragredient transformations. A separate test studied for both 
symmetry and negative definiteness satisfies the property of invariance but its distribution 
depends on a nuisance parameter which is the non-zero root of a certain determinantal 
equation. The likelihood ratio criterion under the hypothesis of symmetry leads to a multi - 
lateral matric equation which represents Jp(p+ 1) equations of the third degree in ip(p-f 1) 
unknown regression coefficients for the p-variate case, and does not admit of a unique 
solution. 


26. The Distribution of Extreme Values in Samples whose Members are Sto¬ 
chastically Dependent. Benjamin Epstein, Wayne University, Detroit. 

In this paper the following problem is considered. To find the distribution of largest 
and smallest values in samples of size n drawn from a random process subject to the follow¬ 
ing conditions: 

(i) observations Xi , xj , • • • , Xn are taken in order from some random process. 

(ii) the random process is such that successive observations Xt and Xi^x are jointly 
dependent. The joint distribution is described analytically, independently of t, 
by a two-dimensional d.f. 

y) • Prob (x< < X, Xi+i < p), 1 < X < n - 1. 

(iii) F,(x, y) - F,(p, x) 

(iv) Any other pairs of observations (x,-, x,>y), 1 < f < n — 1, 2 < j < n — 1, are as¬ 
sumed to be independent. 

The results in this paper generalize the special situation where all observations are inde¬ 
pendent. More general cases than those covered by (i)-(iv) are brieffy considered. 



ABSTRACTS OP PAPERS 


139 


27. On Age-Dependent Stochastic Branching Processes. Richard Bellman 
AND Theodore E. Harris, Stanford University and The RAND Corpora¬ 
tion, Palo Alto and Santa Monica, California. 

An initial particle has a random life length T with c.d.f. G{t). At death it is replaced 
by a random number N of similar particles; P{N n) ^ qn . Particles produced have the 
same distributions of life-length and replacement as the original one. 

eo 00 

Let z(t) = number of particles at timef, h(8) « 2 0 “ 2 ’ ^(^(0 “ n)8”. The 

^ HmO nmmO 

integral equation F(s, 0 " / ^ — 2/)I dG(y) + «[1 — G(t)] uniquely determines 

Jo 

F(8, t) . When suitable restrictions are put on h{8) and G(t) , results of Feller can be applied 
to study the asymptotic behavior of the moments of 2(0, which satisfy linear integral 
equations of the convolution type, and further special results on the moments can be 
obtained. The condition Znqn > 1 and certain further restrictions insure that z(t)/e^^ 
converges in probability as f -♦ «>, where 6 is a certain constant. The m.g.f. ^(«) of the 

limiting distribution satisfies the equation ^(s) — J ^[0(«e“*’*')] dG{y). Further restric¬ 
tions imply that 0(s) is analytic in a neighborhood of s » 0, and that the corresponding 
distribution is absolutely continuous. 

28. Cuboidal Lattices. G. S. Watson, Institute of Statistics, North Carolina. 

Yates has given two series of partially balanced incomplete block designs, square and 
cubic lattices, which enable the experimenter to test respectively and varieties in 
blocks of size k. Harshbarger has recently given a series of designs, rectangular lattices, 
which supplement Yates* square lattices. 

In this paper two series of designs are given called cuboidal lattices, supplementing the 
cubic lattice series. They may be used to test respectively k^(k -f 1) and k(k -f-1)* varieties 
in blocks of A;, when the number of reflications is a multiple of 3. Interblock information 
may be recovered. The first series has a relatively simple analysis and should prove useful. 
This work was sponsored by the Office of Naval Research. 

29. Transformations Induced by Series Approximation of Prior Probability 
Amplitude. Archie Blake, Office of The Surgeon General, U. S. Army. 

Consider a class A of mutually exclusive and exhaustive possible outcomes of a test. 
(We assume A finite; this condition can under suitable conditions be removed at a later 
stage by a limiting process.) For a hypothesis let m be the vector whose value, for each 
member a of A, is the square root of the prior probability of a and h jointly. This vector 
is called the probability amplitude; its norm, the scalar product u'u, is proportional to the 
prior probability of A, the constant of proportionality being determined by comparing the 
norms of the u*8 for all h. Let the test leave the alternatives of a subclass B of A still 
possible, while ruling out the members of A - B. Represent this test by a vector r having 
the value 1 on B and 0 on A - B. Define d on AA as a matrix equal to r on the main diag¬ 
onal and zero elsewhere. The posterior probability is proportional to the form value 
M'dw, the norm of the projection of m on a subspace determined by suppressing the co¬ 
ordinates of A — B. Consider the transformation u ^ iv,i being a matrix on AA and v 
a vector on A. Then u'dw takes the form vU'dtv, Denote Vdt by 6. If u is approximated 
as a partial sum of the series tv^ i.e. by truncating v with a subclass C of A, the truncation 
induced on e is that with the minor on CC. (How much of the prior probability norm is 



140 


ABSTIt^CTS OF PAPERS 


retained with a particular truncation is most easily seen if t is orthogonal, for then the 
transform of u*u is v'v). 

For example, in an agricultural experiment, let A be the composite of P, the class of 
plots, and F, the class of possible yields on a plot. Then u takes the form of a second 
order tensor or matrix on PF, while d and t are fourth order tensors. For some member 
y of F, it often happens that some of the initially most probable, numerous, and economic¬ 
ally consequential hypotheses will be such that for them the values of ti(y) are predomi¬ 
nantly high on some row of plots, low on another row, etc. The transformation u ^ tv 
on P induces the transformation e « t^dt; this is R. A. Fisher’s transformation, performed, 
however, on d instead of on the yields themselves. The truncation of v and e corresponds 
to Fisher’s relegating the higher interactions to error. This calculation may be accom¬ 
panied by a linear transformation on F, e.g. in series of orthogonal functions. (Such 
series are not subject to the disadvantage of classical Gram-Charlier series, which are 
expressed in terms of the probability instead of its square root, that their partial sums 
can be at places negative.) 


30. On the Utilization of Marked Specimens in Estimating Populations of 
Flying Insects. Cecil C. Craig, University of Michigan, Ann Arbor. 

The experimenter catches flying insects, say butterflies, marks and immediately releases 
them. It is assumed that all the insects in a segregated area are equally liable to capture 
whether unmarked or marked, even several times, and that the population is stable for this 
period over which a series of captures is made. From the record of insects caught once, 
twice, three times, and so on, the problem is to estimate the total population. Two mathe¬ 
matical models which seem appropriate are considered and four methods of estimation arc 
compared with respect to the large sample variances of the estimates they give. 


31. On a Probability Distribution. Max A. Woodbury, University of Michi¬ 
gan, Ann Arbor. 

In this paper the probability of x successes in n trials of an cxent is computed for the 
case when the probability of success in a given trial depends only on the number of previous 
successes. The solution P(n, x) satisfies the equation of partial differences 

P(n 4- 1, a: + 1) « (q - qx)P{n, x) -f 9 x+iF(n, x -f 1) 

in the case when ^ = 1. The boundary conditions are obviously P(0,0) = 1 and P(n, x) = 0 
for a; < 0 or >n 4- x. The solution of this equation is obtained by use of a generating func¬ 
tion and P(n, x) proves to be the xth term in the expansion of g" by means of Newton’s 
divided difference formula given the values , • • • ,<?;,••• , . Specifically, by setting 

g «= 1, one obtains the result 

X 

P{n, x) = pop^-'-px-i ??/[(?( - go)(?i - gi)---(g< - g.-i)(g. - - g.)!. 


In the case Px = Po one has the result 


x ! dq^ 




which yields the usual result on simplification. 


32. Distribution-Free Tests of Data from Factorial Experiments. G. W. Brown 
AND A. M. Mood, Iowa State College. 

A device for avoiding the assumption of normality in analysis of variance problems was 



ABSTRACTS OF PAPERS 


141 


developed by M. Friedman (Am. Slat. Aamc. Jour., Vol. 32 (1937), pp. 075-701) in which the 
values of the observations were replaced by their ranks. 

An alternative approach is presented here in w'hich medians are used to construct certain 
continKency tables, and the various null hypotheses of interest are easily tested by means 
of the ordinary chi-square criterion applied to such tables. These tests: 

(1) Avoid the assumption of normality. 

(2) Are particularly sensitive to differences in locations of cell distributions but not to 
their shapes. 

(3) Usually require very little arithmetic computation. 

The tests and the relevant distribution theory have been worked out for some of the 
simpler experimental designs. 


33. On Sums of Symmetrically Truncated Normal Random Variables. Fred 
C. Andrews and Z. W. Birnbaum, University of Washington, Seattle. 

Let Xa be the random variable with the probability density 

/«(X) = for IA-|<a, /.(X)=0 for I A'I > o, 

n 

and let where A"'*^ , *•* , A"i"^ are independent determinations of Xa . 

The problem considered is: for given n, T > 0, c > 0, determine a such that 
P(\ \ > T) — «• The exact solution of this problem would require laborious computa¬ 

tions. In this paper a method is given for obtaining approximate values of a which are 
“safe’^ i.e. such that P(\ \ > T) < «• 

34. On the Foundation of Statistics. Max A. Woodbury, University of 
Michigan, Ann Arbor. 

The results on this paper are part of the author’s University of Michigan dissertation, 
“Probability and Expected Values.” The work covered by this paper was sponsored by 
the Office of Naval Research. One may take the notion of an expected value as the basis 
for the theory of Statistics; i.e. a linear functional on a linear space of random variables 
(real valued functions defined over a population). The space is called statistical if it con¬ 
tains all constant functions and the expected value of such constant functions is just the 
constant and if the expected value of a non-negative function is non-negative. A statistical 
space is called strong if it contains with a random variable also the random variable whose 
values are the absolute values of the given random variable. Every expected value defines 
a probability measure over a quorum of subsets of the population and it is shown that the 
integral of the random variable, if it exists, coincides with the expected value. Further 
it is shown that if the statistical space is strong the integral necessarily exists and also 
that a necessary and sufficient condition that the quorum be a field is that the statistical 
space be strong. 

35. Finitely Additive Probability Functions. Max A. Woodbury, University 
of Michigan, Ann Arbor. 

The results in this paper are part of the work in the author’s University of Michigan 
dissertation, “Probability and Expected Values.” The work covered by this paper was 
sponsored by the Office of Naval Research. A quorum is a family of sets that contains 
with each pair of disjoint sets also their union and also the complement of any of its sets. 
Trivially a quorum is required to contain at least one set and hence at least the universe set 
or population and the empty set. An extension of the notion of a finitely additive prob¬ 
ability measure function to quorums is given and proved to be equivalent to the usual 



142 


ABSTRACTS OP PAPERS 


definition in case the quorum is a field of sets. The extension of a quorum of sets relative 
to the probability measure function is investigated using the properties of the inner and 
outer measure. The upper and lower integrals are defined and a condition for the existence 
of the integral is given. When the quorum is a field it is shown that integrability of a 
function implies the existence of the distribution function. This last result is well known 
in the case where the probability measure function is completely additive. 

36. On Inverting a Matrix via the Gram-Schmidt Orthogonalization Process. 

Max a. Woodbury, University of Michigan, Ann Arbor. 

The application of the classical Gram-Schmidt orthogonalization process to the fac¬ 
torization of a correlation matrix is accomplished by considering the inner product [x, y] *■ 
E{xy) in the linear space determined by the statistical variables , X 2 , . In this 

W’ay a representation of the original set of statistical variables in terms of an orthonormal 
set is obtained. (By an orthonormal set we mean a set , ^2 , • * * , in such that = 0 

X 

for % 7 ^ j and E(^i) = 1.) The matrix of coefficients H = (hi,), w’here =» S , has 

the property that C = BB' where C = {E(XtXj)) and ' denotes the transpose. Further the 
matrix B is triangular hence B*"* is readily computed, from which one obtains at once 
C”» =* {B~^yB~^. The quantities 6 », are readily obtainable by the method of determinants 
(Dwyer and Waugh, Annals of Math. Stat.^ Vol. 16 (1945), pp. 259-271, cf. pg. 264) formerly 
called the method of multiplication and subtraction with division. 

37. Certain Properties of the Multiparameter Unbiased Estimates. G. K. 

Seth, Iowa State College. 

If « (^f , » • “ » ^J*) is an unbiased vector estimate of ^ « (^i , ^2 , • • • , ^ 9 ) in the 

density function p(xi , iC 2 , • • • , , ^2 , , ^ 9 ) having the smallest concentration 

ellipsoid among the class of unbiased estimates of 9, and further if e is any statistic of q 
components having E(t) « 0 and finite covariance matrix, then e is uncorrelated wdth d*. 

If a set of sufficient statistics (Ti , T '2 , ••• , Tp), p < g, exists for estimating 0 , then 
corresponding to any unbiased vector estimate 0 * of there exists an unbiased estimate 
of 0 depending on Ti , 7^2 , • • • , Tp alone, where the latter has a concentration ellipsoid 
equal to or contained in that of the former. 

When g «» 1, and^* has the smallest variance among the class c formed by unbiased esti¬ 
mates of 0 which are functions of 0* having a finite variance, and the set of polynomials 
with respect to the distribution function of is complete, then <t>* is the only element in 
the class c. For g > 1, the result holds when the** variance” is replaced by the**concentra¬ 
tion ellipsoid.” 

38. A Class of Lower Boimds for the Variance of Point Estimates. Douglas 
Chapman, University of California, Berkeley. 

A class of low’er bounds for the variance of point estimates is derived by means of the 
calculus of finite differences under very weak restrictions and it is showm that they give 
valid lower bounds for certain parameter estimation problems for which the Cram4r-Rao 
formula is invalid. In some cases even when the latter lower bound exists a sharper lower 
bound may be found in the class here defined. On the other hand w'hen it exists, the Cramdr- 
Rao lower bound is asymptotically superior to any of this class. 

39. Standard Errors and Tests of Significance for Interpolated Medians. 

Churchill Eisenhart and Miriam L. Yevick, National Bureau of Stand¬ 
ards. 



ABSTRACTS OP PAPERS 


143 


If a sample of N observations is grouped by a sequence of class intervals with boundaries 
— 00 , • • • , X-t, , Xo, xi , a ;2 , • • • , 4*«, where Xo is the largest boundary point for which 

the observed ‘fraction below*, pj?, is less thani, and Xi is the smallest boundary point for 
which the observed ‘fraction above*, pa , is less than i, so that the observed ‘central frac¬ 
tion*, Pc , between Xo and Xi is positive, then, at least for the case of N large, standard text¬ 
books take as the median of the grouped data the interpolated median. 


where 


m - a;o 4- h(xi — Xo) 


6 “ (1 ~ Pb)/pc . 


The literature is silent regarding the sampling properties of such medians, and regarding 
tests of significance appropriate to them. Let Pb and Pc be the population fractions 
below Xo, and between Xo and Xi , respectively, and let u and be the population analogs 
of rn and b obtained by replacing pb and pc in the above equations by Pb and Pc , respec¬ 
tively. It is shown that m is asymptotically normally distributed about u so defined with 
asymptotic variance given by 


^ IPbH - Pb) - 2^PbPc 4 ^^PcH - Pc)] 

where 

Pc 

=r-ordinate of ‘central rectangle* of ‘population histogram*. The 

Xi — Xo 

classical formula for the variance of a median can be obtained as the limit of the above 
when (xi — Xo) 0 with Pb i- 

In addition, tests of hypotheses regarding the value of the ‘interpolated median of the 
population*, a, and regarding the difference, — ui , of the interpolated medians of two 
populations, are developed (1) by utilizing the above asymptotic results, and (2) by utiliz¬ 
ing the Neyman-Pearson likelihood-ratio-test approach. 


40. Some Efficient Range-Estimates of Variation. Nilax Xorris, Hunter 
College, New York. 

The commonly used sample range (in the sense of the difference between the largest and 
smallest of the variates) is one of an unlimited number of range or difference-measures 
which can be used to scale parent populations. For samples drawn from a Type III uni¬ 
verse, the maximum-likelihood estimate of dispersion is given b}' A — G, where A is the 
sample arithmetic mean and G is the sample geometric mean. For samples drawn from a 
Type V universe, a 100% efficient estimate of absolute variation is given by G — //, where 
G is the sample geometric mean and H is the sample harmonic mean. Under certain general 
conditions usually fulfilled, the standard errors of both of these range-measures of absolute 
dispersion may be estimated from expressions obtained by application of the Laplace- 
Liapounoff theorem. The two parametric methods of estimating absolute variation as 
developed in this paper are likely to be most useful when the form of the parent universe 
is known, and it is either too expensive or impossible to obtain samples large enough to 
permit the use of inefficient estimates. An example of such a ease is the learning curve 
encountered in the analysis of frequency of occurrence of aircraft accidents by hours of 
flying experience of pilots in training. E. J. G. Pitman, Proc. Camb. Phil. Soc., Vol. 33 
(1937), pp. 217-218, has discussed the scaling of the Type III distribution. The method 
of scaling given by Pitman differs from the method of estimation developed in this paper for 
the Type III universe. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Franz L. Alt has resigned his position with the Ballistic Research Labora¬ 
tories at Aberdeen to join the National Bureau of Standards where he is in charge 
of the Computation Laboratory of the “National Applied Mathematics 
Laboratory.” 

Dr. Edward W. Barankin has been promoted to Assistant Professor and Re¬ 
search Associate at the Statistical Laboratory, University of California, Berkeley, 
California. 

Dr. Stanley Clark has accepted an associate professorship of Education at 
the College of Education, University of Saskatchewan, Saskatoon, Canada. 

Dr. Gerald J. Cox has resigned his position as Research Chemist in the Chemi¬ 
cal Division of Com Products Refining Co., Argo, Illinois to accept an appoint¬ 
ment as Professor of Dental Research in the School of Dentistry of the Uni¬ 
versity of Pittsburgh. 

Mr. S. Lee Crump has resigned his assistant professorship at Iowa State (Col¬ 
lege to accept a position in the Atomic Energy Project, University of Rochester. 

Dr. John H. Curtiss, Chief of the National Applied Mathematics Laboratories 
of the National Bureau of Standards, has assumed temporary additional duties 
as Acting Chief of the Institute for Numerical Analysis. The Institute for 
Numerical Analysis, located on the U.C.L.A. campus, was established by the 
National Bureau of Standards with the support of the Office of Naval Research 
•and the United States Air Force for the two-fold purpose of pursuing mathemati¬ 
cal research aimed at the development of numerical techniques for the full ex¬ 
ploitation of the newer large-scale electronic computing machines and for per¬ 
forming numerical computations basic to the extension of the frontiers of science. 

Mr. Walter T. Federer has resigned his position at the Statistical Laboratory 
at the Iowa State College to accept a position as Professor of Biological Sta¬ 
tistics in the Department of Plant Breeding at Cornell University. 

Dr. John Gurland, who received his Ph.D. in mathematical statistics from the 
University of California in August, 1948, is now a Benjamin Pierce Instructor in 
Mathematics at Harv^ard University. 

Dr. Joseph L. Hodges, Jr. has been promoted to Instructor and Research As¬ 
sociate at the Statistical Laboratory, University of California, Berkeley. 

Dr. Cyril J. Hoyt has resigned his position as Research Associate with the 
Department of Education at the University of Chicago to accept an appointment 
as Associate Director of the Bureau of Educational Research, University of 
Minnesota. 

Dr. Tjalling C. Koopmans has been promoted to Professor of Economics at 

144 



NEWS AND NOTICES 


145 


the University of Chicago and also Director of Research of the Cowles Com¬ 
mission for Research in Economics. 

Dr. Eugene Lukacs, formerly at Our Lady of ('incinnati College, has accepted 
a position as statistician at the United States Naval Ordnance Test Station at 
Inyokcrn, California. 

Mr. Frank Jones Massey, Jr., who has been in the Department of Mathematics 
at the University of Maryland, has accepted an assistant professorship in the 
Department of Mathematics at the University of Oregon, Eugene, Oregon. 

Miss Judith Moss has resigned her position at the National Bureau of Eco¬ 
nomic Research and is now with the Port of New York Authority as an Eco¬ 
nomic Analyst in the Planning Bureau. 

Dr. Richard Otter has accepted an assistant professorship in the Department 
of Mathematics at the University of Notre Dame. 

Dr. Nathan Grier Parke, III has been appointed Research Fellow of the 
Massachusetts General Hospital and Associated Research Director of the Har¬ 
vard Piatric Study. 

Dr. Joseph A. Pierce is now serving as Chairman of the Division of Natural 
Science and Mathematics at the Texas State University for Negroes, Houston 
4, Texas. 

Dr, Saul B. Sells, former Assistant to the President of the A. B. Frank Co. of 
San Antonio, Texas, has joined the staff of the Department of Psychology of the 
Air University, School of Aviation Medicine, Randolph Field, Texas. 

Dr. Otis A. Pope, who was with the Office of Foreign Agricultural Relations, 
U. S. Department of Agriculture, Technical Collaboration Branch, Wash¬ 
ington, I). C., died September 28th, 1948. 


Special Summer Session in Survey Research Techniques 

The Survey Research Center of the University" of Michigan will hold its special 
summer session in Survey Research Techniques from July 18 to August 13, 1949. 

The following courses will be offered: Introduction to Survey Research, Survey 
Research Methods, Sampling Methods in Survey Research (introductory and 
advanced). Mathematics of Sampling, Statistical Methods in Survey Research, 
Techniques of Scaling. 

In addition the introductory courses will be given from June 20 to July 16. 
This will permit students who are attending the full eight-week summer session 
of the University (June 20 to August 13) to register for the introductory courses 
during the first four weeks. 

It is expected that this special session will attract men and women employed 
in market research or other statitical work and university instructors and gradu¬ 
ate students with a particular interest in this area of social science research. 

All courses are offered for graduate credit and students must be admitted by 



146 


NEWS AND NOTICES 


the Graduate School. Inquiries should be addressed to the Survey Research 
Center, University of Michigan, Ann Arbor, Michigan. 


Summer Courses in Statistics at Michigan 

In addition to the special courses in Survey Research Techniques, the follow¬ 
ing courses of special interest to students of statistics are among those offered 
by the mathematics department of the University of Michigan in the Summer 
Session, June 20 to August 13: Finite Differences (Fischer), Probability (Cope¬ 
land), Theory of Statistics I and II (Carver), Significance Tests (Dwyer), Com¬ 
putational Methods (D^vyer), Theory of Estimation and of Significance Tests 
(Craig) and Seminar (Craig). 


The International Congress of Mathematicians 

No summer meeting of the Institute of Mathematical Statistics is planned for 
1950 because of the meeting of the International Congress of Mathematicians 
which will be held in Cambridge, Massachusetts August 30 to September 0, 1950. 
The following statement has been prepared by the organizing committee: 

An International Congress of Mathematicians will be held in Cambridge, Mas¬ 
sachusetts, in 1950 under the auspices of the American Mathematical Society. 
The Society originally planned to act as host for a Congress in September, 1940, 
which was also scheduled to meet in Cambridge. At the 1936 Congress in Oslo, 
Norway, the invitation for the 1940 Congress was issued by the American dele¬ 
gation in the name of the American Mathematical Society. Plans for the 1940 
Congress were practically completed when the outbreak of World War II in 
I September, 1939, made it necessary for the Society to postpone the Congress to 
a more favorable date. An Emergency Committee was established to carry on in 
the interim and, on recommendation of this Committee, the Council of the 
Society voted to hold the Congress in 1950. 

The 1950 Congress will be the third International Congress of Mathematicians 
to be held on the continent of North America. The first was held at North¬ 
western University in 1893 and the second at the University of Toronto in 1924. 
International Congresses were held at intervals of approximately four years, 
except when war intervened, until 1936. There has been no international gath¬ 
ering of mathematicians since that time and it is the sincere hope of the Or¬ 
ganizing Committee that the gathering in 1950 will be a truly international one, 
that the American mathematicians will attend in large numbers, and that all 
other countries will be well represented. Th^ Council of the American Mathe¬ 
matical Society has voted unanimously to hold a Congress which will be open to 
mathematicians of all national and geographical groups. 

Time and Place, The dates for the Congress have been fixed as August 30- 
September 6, 1950. Harvard University will be the principal host institution. 
A number of other institutions in metropolitan Boston will join in the entertain¬ 
ment of Congress visitors by arranging special features on their campuses. 


NEWS AND NOTICES 


147 


Type of Congress. In recent years mathematicians have been much impressed 
by the success of the conference method for presenting recent research in fields 
where vigorous advances have just been made or are in progress. In view of the 
success of mathematical conferences on special topics which have been held in 
Russia, France and Switzerland and, more recently, at the Princeton Bicentennial 
Celebration, the 1950 Congress will include Conferences in several fields. For 
the 1940 Congress, Conferences in four fields had been planned. The number of 
Conferences was thus restricted lest the introduction of a promising and novel 
feature result in failure through the dissipation of interest and energy. 

Following the established custom, the Organizing Committee plans to have a 
number of invited hour addresses by outstanding mathematicians. In addition, 
sectional meetings for the presentation of contributed papers not included in 
Conference programs will be held in the following fields: I, Algebra and Theory 
of Numbers; II, Analysis; III, Geometry and Topology; IV, Probability and 
Statistics, Actuarial Science, Economics; V, Mathematical Physics and Applied 
Mathematics; VI, Logic and Philosophy, VII, History and Education. 

The official languages of the 1950 Congress will be English, French, German, 
Italian, and Russian. 

Organization. The plans for the Congress are under the supervision of an 
Organizing ('ommittce which was elected by the Council of the American Mathe¬ 
matical Society in February, 1948. The Chairman is Professor Garrett Birkhoff 
of Harvard University and the Vice Chairman is Professor \V. T. Martin of 
Massachusetts Institute of Technology. Other members of the committee are: 
Professors J. L. Doob, G, C. Evans, J. R. Kline, Solomon Lefschetz, Saunders 
MacLane, Dean R. G. D. Richardson, Professors Oswald Veblen, J. L. Walsh, 
D. V. Widder, Norbert Wiener, and R. L. Wilder. 

Many of the subventions promised for the 1940 Congress are still available. 
A Financial Committee under the chairmanship of Professor John von Neumann 
is endeavoring to secure additional funds. Besides support from Harvard Uni¬ 
versity and Massachusetts Institute of Technology, generous subventions have 
been subscribed for the Congress by the Carnegie Corporation, the Institute for 
Advanced Study, the National Research Council, and the Rockefeller 
Foundation. 

An Editorial Committee under the chairmanship of Professor Salomon Bochner 
will assume responsibility for the publication of the Proceedings of the Congress. 

Professor J. R. Kline of the University of Pennsylvania has been named Secre¬ 
tary of the Congress and Dr. R. P. Boas, Executive Editor of Mathematical 
Reviews, has been designated Associate Secretary. 

Entertainment. Harvard University has offered the use of its dormitories and 
dining rooms for mathematicians and their guests for the period of the Congress. 
The Organizing Committee hopes that it will be possible to furnish room and 
board without charge to all mathematicians from outside continental North 
America who are members of the Congress. Congress membership fees and rates 
for room and board will be announced well in advance of the opening of the 
Congress. 



148 


NEWS AND NOTICES 


The Entertainment Committee, of which Professor L. H. Loomis of Harvard 
University is Chairman, is planning many interesting features, including a re¬ 
ception, garden party, symphony concert, and banquet. It is hoped that Amer- 
can mathematicians will be able to assist in the entertainment by putting their 
automobiles at the disposal of the Entertainment Committee for trips to be 
made out of Cambridge. 

Every effort will be made to facilitate the travel at reasonable cost of foreign 
participants while in the United States. Previous to the Congress, opportunity 
will be given them to see New York City under the guidance of some mathe¬ 
maticians. 

Information, Detailed information will be sent in due course to individual 
members of the American Mathematical Society and to foreign mathematical 
societies and academies. Others interested in receiving information may file 
their names in the office of the Society, and such persons will receive from time 
to time information regarding the program and arrangements. 

Communications should be addressed to the American Mathematical Society, 
531 West 116th Street, New York City 27, U. S. A. 


New Members 

The following persons have been elected to membership in the Institute 
(August 16, 1948 to November 30, 1948) 

Alman, John E., M.A. (Claremont Colleges) Instructor in Mathematics, College of Liberal 
Arts, Boston Universitj’’, 216 Gardner Roady Brookline 4^, Massachusetts. 

Andrian, Jane F., M.8. (Western Reserve Univ.) Graduate student at University of Cali¬ 
fornia, 1222 C. Ashby AvenuCy Berkeley 2y California. 

Arbuckle, Richard A., B.S. (Baldwin Wallace College) Research-Industrial Fellow at 
Purdue University, F.Ph.A. 630-3 Airport Road, Ijafayelle, Indiana. 

Barankin, Edward W., Ph.D. (Univ. of Calif.) Assistant Professor of Mathematics and 
Research Associate in Statistical Laboratory, University of California, Berkeley, 
California. 

Blum, Julius R., Student in mathematical statistics at the University of California, 1957 
Acton Streety Berkeley 2, California. 

Bronfenbrenner, Mrs. Jean, M.A. (Univ. of Chicago) Research Assistant, Cowles Com¬ 
mission, University of Chicago, Chicago 37, Illinois. 

Bums, Loren V., B.S. (Washburn College, Topeka, Kansas) Technical Director, MFA 
Milling Co., Box 1585 S.S.S.y Springfieldy Missouri. 

Clement, Edwin G., M.B.A. (Univ. of Chicago) Captain, Chief of Management Control 
Branch, Headquarters, Strategic Air Command, Andrews Air Force Base, Washing¬ 
ton 20, D. C. 

Cramer, George F., Ph.D. (Univ. of Missouri) Mathematician, U. S. Navy Department, 
Washington, D. C., 112 Quincy Streety Chevy Chase 16 y Maryland. 

Degan, James W., A.B. (Univ. of Chicago) Research Assistant, Psychometric Laboratory, 
University of Chicago, 1128 East 61st Streety Chicago 37y Illinois. 

Dodd, Stuart C., Ph.D. (Princeton) Research Professor of Sociology and Director of Public 
Opinion Laboratory, 4135 — 4^tk Avenuey N.E.y Seattle 5y Washington. 

Donnelly, Tom G., M.A. (Queen’s Univ.) Graduate student at the University of North 
Carolina, Room 213 “B”, Chapel Hilly North Carolina. 



NEWS AND NOTICES 


149 


Edwards-Davies, Harold D., Special Lecturer, Department of Mathematics, Dalhousie 
University, 67 Seymour Street^ Halifax^ N.S., Canada. 

Ellner, Henry, Ch.U. (College of City of New York) Statistician (Physical Sciences) 
1-C Oak Grove Drive^ Baltimore 20, Maryland. 

Feigenbaum, Armand V., M.S. (Mass. Institute of Tech.) General Electric Company, 
Room 257, Building 23, Schenectady, New York. 

Festlnger, Leon, Ph.D. (Univ. of Iowa) Assistant Professor of Psychology, Research Cen¬ 
ter for Group Dynamics, University of Michigan, Ann Arlx)r, Michigan. 

Frame, James S., Ph.D. (Harvard) Professor and Head of Department of Mathematics, 
Michigan State College, Lansing, Michigan. 

French, Benjamin J., M.Ed. (Univ. of New Hampshire) Examiner, Educational Testing 
Service, Matthews Road, Keene, New Hampshire. 

Gaffey, William R., A.B. (Univ. of Calif.) Research Assistant, University of California, 
2S06 Grant. Street, Berkeley 4, California. 

Goodman, Leo A., A.B. (Syracuse University) Research Assistant in Mathematical Sta¬ 
tistics and Graduate student at Princeton University, Fine Hall, Princeton Univer¬ 
sity, Princeton, New Jersey. 

Hader, Robert J., Ph.D. (North Carolina State College) Instructor and Research As¬ 
sistant, Institute of Statistics, North Carolina State College, Raleigh, North Carolina. 

Haley, Kenneth D., M.S. (Stanford Univ.) Assistant Professor of Mathematics, Acadia 
University, Wolfville, Novia Scotia, Canada. 

Kahn, Louis B., M.S. (Lhiiv. of Wisconsin) R(*search Associate, University of Wisconsin, 
Box 16-F, Badger, Wisconsin. 

Katz, Irving, B.S. (College of City of New York) Statistician, Strategic Air Command, 
S79—37 Place, S.K^, Washington 19, D. C. 

Kientzle, Mary J., Ph.D. (Univ. of Ill.) Assistant Professor of Psychology, Department of 
Psychology, Washington State College, Pullman, Washington. 

Koditschek, Paul, LI. D. (Univ. of Vienna) Research Associate, Scientific Research Serv¬ 
ice, Columbia University, 319 W. ISth Street, New York 14^ New York. 

Levin, Howard S., S.B. (Univ. of Chicago) Electronic Engineer, Glenn L. Martin Co., 
632 Addison Street, Chicago 13, Illinois. 

Levine, George J., B.S. (Brooklyn College) Actuarial Mathematician, 6109 — 1st Street, 
North, Arlington, Virginia. 

Liverman, J. G., B.A. (Cantab) Civil Servant, Ministry of Fuel and Power, 21 Ascot Court, 
Grove End Road, London, AMY. 8, England. 

Loeve, Michel, Ph.D. (Sorbonne, Paris) Professor and Res(‘arch .V.ssociate in Statistical 
Lalx)ratorv, Durant Hall, University of California, Berkeley, California. 

Loo, Ching-Tsu, Ph D. (Univ. of Chicago) Research Associate, Statistical Lalx)ratory, 
ITniversity of California, Berkeley, California. 

Lubin, Ardie, B.S. (Univ. of Chicago) Statistician, Psychology Department, Maudslcy 
Hospital, Denmark Hill, S.E. 5, London, England. 

Moses, Lincoln E., A.B. (Standord Univ.) 7 Perry Lane, .Menlo Park, California. 

Mourier, Edith, Licencie-es-sciences (Univ. of Caen, France) Teaching Assistant, Stati.sti- 
cal Laboratory, University of California, Berkeley, California. 

Osborne, Ernest L., L.L.B. (LaSalle Univ.) Economic Analyst, Department of the Army, 
Chancery Apartments, 3130 Wisconsin Avenue, .V.ir., Washington 10, D. C. 

Pabst, William R. Jr., Ph.D. (Columbia Univ.) Quality Control Division, Bureau of Ord¬ 
nance, Navy Department, 3420 Quebec Street, AMY., Washington 16, 1). C. 

Plackett, Robin L., M.A. (Cambridge, England) Lecturer in Mathematical Statistics, 
Department of Applied Mathematics, The University, Liverpool 3, England. 

Proschan, Frank, M.A. (George Washington Univ.) Research Analyst, 1627 R. St., AMY., 
Washington 9, D. C. 



150 


REVISION OP BY-LAWS 


Ran, A. Ananthapadmanabha, M.S. (Iowa State College) Statistician and Agricultural 
Meteorologist, Department of Agriculture, Bangalore, Mysore State, India. 

Rees, Mina, Ph.D. (Univ. of Chicago) Head, Mathematics Branch, Office of Naval Re¬ 
search, R2719, T-3 Building, Washington 25, D. C. 

Roberts, Spencer W. Jr., M.S. (Univ. of Michigan) Research Associate, University of 
Michigan Department of Engineering Research, SOS Thompson Street^ Ann Arhor^ 
Michigan, 

Sarma, S. C., M.Sc. (Calcutta Univ.) Graduate student in mathematical statistics at 
Columbia University, IISO John Jay Hally Columbia Universityy New York S7y 
New York. 

Schneiderman, Marvin A., B.S. (College of City of New York) Statistician, Biological, Na¬ 
tional Institute of Health, T-6, 2215, Bethesda, Maryland. 

Schull, William J., Ph.D. (Ohio State Univ.) Student at Ohio State University, Depart¬ 
ment of Zoology, Ohio State University, Columbia 10, Ohio. 

Schweld, Samuel, B.S.S. (College of City of New York) Statistician, Industry Division, 
Bureau of the Census, 1110 Monroe Streety N.W.y Washington lOy D. C. 

Wallace, David L., B.S. (Carnegie Institute of Tech.) Graduate Student and Teaching 
Assistant in Mathematics, Carnegie Institute of Technology, 1$S Lawrence AvenuCy 
Homestead Parky Pennsylvania. 

Williams, Evan James, B.C. (Univ. of Tasmania) Research Officer, Section of Mathe¬ 
matical Statistics, Division of Forest Products, C.S.I.R., P.O. Box 18, South Mel¬ 
bourne, S.C. 4, Australia. 

Zavrotsky, Andres, Head of the Statistical Department of the Venezuela Office for Social 
Insurance, Mercedes a Luneta 39y Caracas. 

Correction of New Members in June, 1948 issue: 

Lolzeller, Enrique Blanco, should be written as follows; 

Blanco Lolzeller, Enrique. (Ph.D.) Professor of Statistics, Economics Faculty, Madrid 
University, Spain, Nervion No. 4, Madridy Spain. 


ELECTION OF OFFICERS AND COUNCIL AND REVISION OF BY-LAWS 

At the membership meeting held at Cleveland on December 28, the following 
officers and members of the Council were elected: 


President: 

President-Elect: 

Council: 

3-year term 


2-year term 


1-year term 


J. Neyman 
J. L. Doob 
'W. G. Cochran 
C. Eisenhart 
H. Hotelling 
A. Wald 
W. Feller 
P. G. Hoel 
H. Scheff4 
J. Wolfowitz 
Gertrude Cox 
M. A. Girshick 
J. W. Tukey 
J. von Neumann 



REPORT ON SEATTLE MEETING 


151 


The By-Laws were also revised and further action was taken. More detailed 
accounts of this meeting will be sent directly to the members. 

Paul S. Dwyer 

Secretary 


REPORT ON THE SEATTLE MEETING OF THE INSTITUTE 

The thirty-sixth meeting and fourth Regional West Coast meeting of the 
Institute of Mathematical Statistics was held in Seattle, Washington, November 
26“27, 1948. The sessions of November 27, 1948 were held jointly with the 
Biometric Society (Western N. A. Region). The meeting was attended by 91 
persons, including the following 22 members of the Institute: 

F. C. Andrews, E. W. Barankin, Z. W. Birnbaum, A. H. Bowker, D. G. Chapman, R. C. 
Davis, W. J. Dixon, E. Fay, M. A. Girshick, P. Horst, H. M. Hughes, J. C. R. Li, F. Massey, 
J. Neyman, E. Paulson, Elizabeth L. Scott, Esther Seiden, M. Sobel, Z. Szatrowski, J. R. 
Vatnsdal, J. E. Walsh and Zivia S. Wurtele. 

At the morning session on November 26, Professor R. M. Winger of the Uni¬ 
versity of Washington as chairman welcomed those attending the meetings, and 
the following program of contributed papers was presented; 

1. Estimation of the Variance of the Bivariate Normal Distribution. 

Harry M. Hughes, University of California. 

2. Derivation of a Broad Class of Consistent Estimates. 

R. C. Davis, NOTS, Inyokern, California. 

3. Locally Best Unbiased Estimates. 

Edward W. Barankin, University of California. 

4. Some Problems Related to the Distribution of a Random Number of Random Variables. 

Edward Paulson, University of Washington. 

5. Asymptotic Expansions for the Distribution of Certain Likelihood Ratio Statistics. 

Albert H. Bowker, Stanford University. 

6. On a Problem of Confounding in Symmetrical Factorial Design. 

Esther Seiden, University of California. 

7. So?ne Bounded Significance Level Tests of Whether the Largest Observations of a Set are 

Too Small. 

John E. Walsh, Project RAND, Douglas Aircraft Corp., Santa Monica, Calif. 

The afternoon session of November 26, under the chairmanship of Professor 
J. Neyman of the University of California at Berkeley, had the following 
program: 

1. Invited paper: 

Multiple Decision Functions. 

M. A. Girshick, Stanford University. 

Contributed papers: 

2. Determination of Optimal Test Length to Maximize the Multiple Correlation Coefficient. 

Paul Horst, University of Washington. 

3. Some Numerical Comparisons of a Non-Parametric Test with Other Tests. 

F. J. Massey, University of Oregon. 



152 


REPORT ON CLEVELAND MEETING 


4. On the Deviation of Extreme Values. 

W. J. Dixon, University of Oregon. 

5. The Optimum Size of Interval for Makhuj Measurements of a RockeVs Angular Velocity. 
Edward A. Fay, University of California. 

6. Stationary Time Scries Analysis and Common Stock Price Forecasting. 

Zenon Szatrowski, University of Oregon. 

At the morning session of November 27, with Professor W. F. Thompson of the 
University of Washington as chairman, the program consisted of the following 
papers: 


1. Invited paper: 

On the Place of Statistics in Fishery Biology. 

Willis S. Rich, Stanford University and U. S. Fish and Wildlife Service. 
Contributed papers: 

2. Distribution of the Number of Schools of Fish Caught per Boat. 

J. Neyman, University of California. 

3. Some Problems in Fishery Research to which Statistical Methods are Applicable. 

Ralph Silliman, U. S. Fish and Wildlife Service, Seattle, Washington. 

4. The Application of the Hypergcometric Distribution to Problems of Estimating and Com¬ 
paring Zoological Population Sizes. 

Douglas Chapman, University of California. 

5. Extension to Multivariate Case of Neyman*s Smooth Test. 

Elizabeth L. Scott, University of California. 

6. A Mathematical Theory of Vitamin A Metabolism in Fish. 

Norman Vj. Cooke, Pacific Fisheries Uxpcrimcntal Station, Vancouver, B.C. 

The afternoon session of November 27 was held under the chairmanship of 
Professor F. W. Weymouth of Stanford University, with the following program: 


1. Invited paper: 

Statistical Problem of Enumeration of Fish Eggs in the Sea. 

Oscar E. Sette, U. S. Fish and Wildlife Service, San Francisco. 
Contributed papers: 

2. The Interaclance Hypothesis. 

Stuart C. Dodd, University of Washington. 

3. The Employment of Marked Members in Estimation of Animal Populations. 
Milner E. Schaefer, Stanford University. 

4. Non-Response and Repeated Call-Backs in Opinion Polls. 

Z. W. Birnbaum, University of Washington. 

5. Statistical Problems Relating to Fisheiics. 

J. L. Hart, Pacific Biological Station, Nanaimo, B. C. 


On November 20, at 0:30 o’clock there was a dinner for members and guests 
at the Edmond Meany Hotel. 


Z. W. Birnbaum 


REPORT ON THE CLEVELAND MEETING OF THE INSTITUTE 

The Eleventh Annual Meeting of the Institute of Mathematical Statistics was 
held at the Statler Hotel, Cleveland, Ohio, on December 27-30, 1948. The 



REPORT ON CLEVELAND MEETING 


153 


meeting was held in conjunction with the Annual Meeting of the American 
Statistical Association. The following 176 members of the Institute were in 
attendance: 

P. H. Anderson, R. L. Anderson, L. W. Anderson, Max Astrachan, G. J. Auncr, T. A. 
Bancroft, B. Geoffrey, Z. W. Birnbuum, Archie Blake, E. E. Blanche, C. I. Bliss, Dorothy 
S. Brady, A. E. Brandt, G. W. Brown, T. H. Brown, M. A. Brumbaugh, P. T. Bruybre, R. W. 
Burgess, I.W. Burr, J. M. Cameron, A. G. Carlton, Harry Carver, F. R. Celia, Uttam Chand, 
R. A. Chapman, Edmund Churchill, Herman Chernoff, W. G. Cochran, Jerome Cornfield, 
J. H. Cover, Gertrude M. Cox, C. C. Craig, S. L. Crump, J. H. Curtiss, D. A. Darling, W. 
L. Deemer, D. B. DeLury, W. E. Deming, Philip Desind, H. F. Dorn, C. W. Dunnett, P. 8. 
Dwyer, Churchill Eisenhart, Benjamin Epstein, C. D. Ferris, Leon Festinger, C. H. Fischer, 
J. C. Flanagan, M. M. Flood, L. R. Frankel, D. A. 8. Fraser, H. A. Freeman, Milton Fried¬ 
man, H. C. Fryer, E. V. Gardner, R. S. Gardner, H. If. Germond, William Gomberg, E. L. 
Green, 8. W. Greenhouse, J. Gurland, R. J. Hader, K. W. Halbert, H. J. Hand, M. H. Han¬ 
sen, T. E. Harris, Boyd Harshbarger, P. M. Houser, J. F. Hofmann, Harold Hotelling, A. 8. 
Householder, E. E. Houseman, Helen M. Humes, C. C. Hurd, C. M. Jaeger, R. J. Jessen, 
H. L. Jones, Irving Katz, Leo Katz, Harriet J. Kelly, O. Kempthorne, A. W. Kimball, Jr., 
A. J. King, Leslie Kish, L. A. Knowler, Lila ¥. Knudsen, C. F. Kossack, O. E. Lancaster, 
Marvin Lavin, 8. B. Littauer, Irving Lorge, F. W. Lott, Jr., Eugene Lukacs, P. J. McCar¬ 
thy, C. J. Maloney, John Mandel, Nathan Mantel, H. B. Mann, E. 8. Marks, Margaret 
Merrcll, Helen Michaels, E. B. Mode, A. M. Mood, Nathan Morrison, Dorothy J. Morrow, 
J. W. Morse, J. E. Morton, Jack Moshman, Frederick Mosteller, B. D. Mudgett, Hugo 
Muench, M. R. Neifeld, R. H. Noel, G. E. Noether, J. I. Northam, H. W. Norton, J. A. 
Norton, Jr., E. G. Olds, P. 8. Olmstead, Bernard Ostle, A. E. Pauli, Paul Peach, M. P. 
Peisakoff, E. W. Pike, E. J. G. Pitman, R. A. Porter, J. A. Rafferty, L. J. Reed, Olav Reier- 
sol, William Reitz, F. D. Rigby, A. C. Rosander, Herman Rubin, Erik Ruist, P. J. Rulon, 
Max Sasuly, F. E. Satterthwaite, L. J. 8avage, Mary Ann 8avas, Marvin Schniederman, 
Elizabeth Scott, G. R. Seth, Jack 8hcrman, 8. 8. Shrikhande, C. R. Simms, J. H. Smith, 
G. W. Snedecor, Mortimer Spiegelman, B. R. Stauber, F. F. Stephan, Joseph Steinberg, 
J. V. Sturtevant, B. J. Tepping, W. R. Thompson, J. W. Tukey, Jan Vchytil, W. R. Van 
\'oorhis, D. F, Votaw, Jr., F. M. Wadley, Helen M. Walker, D. L. Wallace, W. A. Wallis, 
G. 8. Watson, Leonel Weiss, Samuel Weiss, E. L. Welker, M. E. Wcscott, Phillips Whidder, 
D. R. Whitney, 8. 8. Wilks, C. P. Winsor, Gerald Winston, M. A. Woodbury, T. D. Woolsey, 
Holbrook Working, W. J. Youden. 

The first session, a joint session with the American Statistical Association, 
was held at 2:00 P.M. on Monday, December 27, at which time a paper entitled 
Statistical Concepts in an Infinite Number of Dimensions was presented by Pro¬ 
fessor David H. Blackwell of Howard University. Professor E. J. G. Pitman 
of the University of Tasmania was chairman. 

The second session of the opening day was devoted to contributed papers in 
mathematical statistics, and was held at 4:00 P.M. in conjunction with the 
American Statistical Association. Professor W. R. Van Voorhis of Fenn College 
was chairman. The following papers were presented: 

1. A Necessary Condition for a Certain Class of Characteristic Functions. Preliminary 
report. Eugene Lukacs, NOTS, Inyokern, California and Our Lady of Cincinnati 
College, Cincinnati, Ohio. 

2. Precision of Estmatesfrom Samples Selected under Marginal Restrictions. Preliminary 
report. Clifford J. Maloney, Research and Development Department, Camp Det¬ 
rick. Frederick, Maryland. 



154 


REPORT ON CLEVELAND MEETING 


3. Properties of Maximum and Quasi-Maximum Likelihood Estimates of Parameters of a 
System of Linear Stochastic Difference Equations with Serially Correlated Disturbances. 
Preliminary report. Herman Rubin, Cowles Commission, University of Chicago. 

4. The Computation of Maximum Likelihood Estimates of Parameters of a System of Linear 
Stochastic Difference Equations with Serially Correlated Disturbances. 

Herman Chernoff, Cowles Commission, University of Chicago. 

5. Test Criteria for Hypotheses of Symmetry and Definiteness of a Regression Matrix for 
Demand Functions. 

Uttam Chand, University of North Carolina. 

6. The Distribution of Extreme Values in Samples whose Members are Stochastically De¬ 
pendent. 

Benjamin Epstein, Wayne University. 

A session on Teaching Statistical Quality Control was held on Monday evening, 
December 27, jointly with the Ohio Section of the American Society for Quality 
Control and Section on Training of Statisticians of the American Statistical 
Association. Professor Samuel S. Wilks of Princeton University presided at the 
session. The following two papers were presented.: 

1. Teaching Statistical Quality Control for Toivn and Gown. 

Lloyd A. Knowlcr, State University of Iowa. 

2. Instructional Aids for Statistical Quality Control. 

Edwin G. Olds, Carnegie Institute of Technology. 

The session concluded with discussion by Professor Irving W. Burr of Purdue 
University, and Professor Theodore H. Brown of Harvard University. 

A session on Review of Statistical Methodology was held jointly with the Ameri¬ 
can Statistical Association at 2:00 P.M., December 28. Professor Frederick 
Mosteller of Harvard University presided. The following papers were presented: 

1. Surveys and Sampling. 

Philip J. McCarthy, Cornell University. 

2. Industrial Applications. 

Paul S. Olmstcad, Bell Telephone Laboratories. 

3. Biologyy Physical Sciences and Experimental Design. 

W. J. Youden, National Bureau of Standards. 

At 4:00 P.M. on Tuesday, December 28, Professor H. C. Fryer of Kansas 
State College presided at a joint session with the Biometric Society and Bio¬ 
metrics Section of the American Statistical Association. Papers presented were ; 

1. Evaluation of Field Insecticides from Count of Survivors. 

C. I. Bliss and Neely Turner, Connecticut Agricultural Experiment Station. 

2. Curved Dosage-Response Curves. 

Oscar Kempthorne, Iowa State College. 

3. Statistical Variations in Contents of Dry-FilUd Ampuls in Current Pharmaceutical 
Practice. 

M. W. Green, American Pharmaceutical Association, and Lila F. Knudsen, Food and 
Drug Administration. 

4. A Practical Method for Determining the Mean and Standard Deviation of Truncated 
Normal Distributions. 

J. Ipsen, Yale University. 



RB]PORT ON CLEVELAND MEETING 


155 


The session was concluded with discussion by D. B. DeLury, Ontario Research 
Foundation; Lloyd Miller, Sterling-Winthrop Research Institute; C. Eisenhart, 
National Bureau of Standards; J. L. Northam, Kansas State College. 

On Wednesday, December 29, at 2:00 P.M., Dr. W. Edwards Deming presided 
at a session on Effects of Error in the Independent Variate in Regression Problems, 
This meeting was held in conjunction with the Biometric Society and Biometric 
Section of the American Statistical Association. Papers presented were: 

Are There Two Regressionsf 
Joseph Hcrkson, Mayo Clinic. 

2. Present Status of the Theory. 

Jerzy Neyman, University of California. 

3. The Identifiahility of a Linear Relationship Between Variables iihich are Subject to 
Error. 

Olav Reiersol, Purdue University. 

These papers were followed by discussion by Professor Churchill Eisenhart, Na¬ 
tional Bureau of Standards, Elizabeth L. Scott, University of California, and 
C. P. Winsor, Johns Hopkins University. 

Professor Boyd Harshbarger, of the Virginia Polytechnic Institute, presided 
at the Wednesday afternoon seasion on contributed papers in mathematical sta¬ 
tistics. Papers presented were: 

1. On Age-Dependent Stochastic Branching Processes. 

Richard Heilman and Theodore K. Harris, Stanford University, Palo Alto, Cali¬ 
fornia and the Rand (corporation, Santa Monica, California. 

2. Cuboidal Lattices. 

G. S. Watson, Institute of Statistics, University of North (Carolina. 

3. Transformations Induced by Series Approximation of Prior Probability Amplitude. 
Archie Blake, Ollice of the Surgeon General, U. S. Army. 

4. On the Utilization of Market Specimens in Estimating Populations of Flying Inserts. 
Cecil C. Craig, University of Michigan. 

5. On a Probability Distribution. 

Max A. Woodbury, University of Michigan. 

6. Distribution-Free Tests of Data from Factorial Experiments. 

G. W. Brown and A. M. Mood, Iowa State College. 

7. On Sums of Symmetrically Truncated Normal Random Variables. 

Fred C. Andrews and Z. W. Birnbaum, University of Washington. 

8. On the Foundation of Statistics. 

(By title). Max A. Woodbury, University of Michigan. 

9. Finitely Additive Probability Functions. 

(By title). Max A. Woodbury, University of Michigan. 

10. On Inverting a Matrix via the Oram-Schmidt Orthogonalization Process. 

(By title). Max A. Woodbury, University of Michigan. 

11. Certain Properties of the Multiparameter Unbiased Estimates. Preliminary report. 
(By title). Gobind R. Seth, Iowa State College. 

12. A Class of Lower Bounds for the Variance of Point Estimates. 

(By title). Douglas Chapman, University of California. 

13. Standard Errors and Tests of Significance for Interpolated Medians. 

(By title). Churchill Eisenhart and Miriam L. Ycvick, National Bureau of Stand¬ 
ards. 



156 


REPORT OP THE PRESIDENT 


A symposium on Randomness and its Testing occupied the 4:00 P.M. session 
on Wednesday. Dr. Walter A. Shewhart of the Bell Telephone Laboratories 
presided and the following papers were presented: 

1 Survey of Available Tests for Randomness. 

W. Allen Wallis, University of Chicago. 

2. Power Functions of Tests for Randomness. 

H. B. Mann, Ohio State University. 

3. Power Functions of Non-Paramctric Tests. 

Ransom Whitney, Ohio State University. 

Discussion was led by Bernice Brown, The Rand Corporation; Paul S. Olmstead, 
Bell Telephone Laboratories; E. J. G. Pitman, University of Tasmania. 

The morning session on Thursday, December 30, was a joint session with the 
American Statistical Association, with Professor Jerzy Neyman of the University 
of California presiding. The following two papers were presented upon invita¬ 
tion of the Institute: 

1. Estimating Linear Restrictions on Regression Coefficients for Multivariate Norma!' 
Distributions. 

T. W. Anderson, Columbia University. 

2. Some Aspects of the Theory of 2'esting Composite Hypotheses. 

E. L. Lehmann, University of California. 

The Business Meeting was held at 10:00 A.M. on Tuesday, December 28. 
Dr. Churchill Eisenhart presided. A report of this meeting is found elsewhere 
in this issue. 

W. R. Van Voorhis 
Assistant Secretary 


REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1948 

The last few years have seen a considerable growth of the Institute. The 
upward trend has continued throughout 1948. The Institute has acquired 126 
new members during the year, but this gain is to be balanced against losses due 
to resignation and suspension for non-payment of dues. The Institute starts 
the year 1949 with a membership of about 1,100 as against the membership of 
1,037 at the beginning of 1948. While the net gain is still substantial, it is not 
quite as much as hoped for, and this may serve as an incentive for an increased 
membership drive in 1949. The constantly increasing interest and research 
activities in statistical theory and methodology are well reflected in our meetings 
and the publications appearing in the Annals.^ 

Meetings. The growth of the Institute in the past few years has brought 
about a considerable increase in its various activities. This manifested itself 
particularly in the extensive and rich programs of the meetings held during the 
year 1948. In addition to the usual invited addresses and contributed papers, 
the programs included a considerable number of symposia on various important 



REPORT OF THE PRESIDENT 


157 


subjects such as the theory of games (Berkeley, June; Madison, September), 
stochastic difference equations (Madison, September), scales of measurement 
(New York, April), sampling for industrial use (Berkeley, June), etc. The 
eleventh summer meeting was held in conjunction with the meetings of the 
American Mathematical Society and the Econometric Society (Madison, Septem¬ 
ber). The eleventh annual meeting (Cleveland, December) was held in con¬ 
junction with the American Statistical Association, Econometric Society and 
Biometric Society. There were also three regional meetings: New York (April), 
Berkeley (June) and Seattle (November). The Berkeley meeting was held 
in conjunction with the Pacific Division of the American Association for the 
Advancement of Science and some of the sessions of the Seattle meeting were 
sponsored jointly with the Biometric Society. 

To facilitate the organization of meetings and arrangements of programs, 
instead of a single program committee there were three program committees 
appointed, one for Eastern, one for Mid-Western and one for Far-Western meet¬ 
ings. These committees consisted of the following members. Eastern Com¬ 
mittee: W. G. Cochran, C. Eisenhart (Chairman), F. Mostcller, and J. Wolfo- 
witz; Mid-Western Committee: C. C. (^raig, H. B. Mann, and A. M. Mood 
(Chairman); Far Western Committee: Z. W. Birnbaum, M. A. Girshick, P. G. 
Hocl, and J. Ncyman (Chairman). To coordinate the work of these three pro¬ 
gram committees, a coordinating committee was appointed consisting of J. W. 
Tukey (Chairman) and the three chairman of the three program committees. 
This committee was also charged with the responsibility of making recommenda¬ 
tions to the Board of Directors as to times and places for future meetings. 
Another innovation introduced during the past year was the appointment of 
assistant secretaries in connection with the meetings. S. B. Littauer acted as 
assistant secretary for the New York meeting, K. J. Arnold for the summer 
meeting in Madison, Z. W. Birnbaum for the Seattle meeting and W. R. Van 
Voorhis for the (Cleveland meeting. The assistant secretaries were charged with 
the task of looking after the local arrangements that had to be made in connec¬ 
tion with the meetings. The appointment of assistant secretaries proved to 
be a great success not only in facilitating the necessary local arrangements for 
meetings but also in relieving the burden on the secretary’s office. On the basis 
of this year’s experience, it seems very desirable to continue with this practice 
in the future. 

No Rietz Memorial lecture was given in 1948 in accordance with a decision 
of the Board of Directors that these lectures should not be given every year. 
It is planned, however, to have a Rietz lecture for 1949 and the Board of Direc¬ 
tors invited J. Neyman to deliver it. 

The Neiv Constitution. One of the major events of the year was the adoption 
of the new constitution at the meeting in Madison. The growth of the Institute 
in recent years made parts of the old constitution obsolete and the need for a re¬ 
vision was apparent. Our thanks are due to the (^ommittee on Planning and 
Development which has devoted much time and consideration to the study of 



158 


REPORT OF THE PRESIDENT 


the problem and prepared a draft of a revised constitution. M. H. Hansen was 
chairman of this Committee. Other members were: J. H. Curtiss, W. G. 
Cochran, W. Feller, J. Neyman, H. W. Norton, F. F. Stephan, J. W. Tukey, 
and W. A. Wallis. A draft of the new By-Laws was prepared by J. W. Tukey, 
who acted as a subcommittee of the Committee on Planning and Development. 

Annals, The growth of the Institute during the past few years has mani¬ 
fested itself also in a constantly increasing number of manuscripts submitted for 
publication in the Annals, While it is very gratifying to see this upward trend, 
it raises some problems of financial nature. At the rate manuscripts are com¬ 
ing in, an expansion of the publication facilities of the Institute would seem 
very desirable. Increase of the volume of the Annals would, however, mean 
increased cost and the present financial situation of the Institute could not 
allow such an additional burden unless some new sources of income can be found. 
Apart from a possible increase in the cost of printing the Annals, it seems that 
additional expenditures will be necessary for secretarial help in 1949. It was 
decided at the membership meeting in Madison that additional funds be raised 
through the contributions of universities and other organizations with strong 
interest in mathematical statistics and through the contributions of the members. 
Appeals for such contributions were sent out and it is hoped that there will be a 
generous response. 

The new constitution permits the appointment of responsible Associate Edi¬ 
tors. This brings up the whole question of editorial set-up and policies. A 
committee with S. S. Wilks as chairman was appointed to make a thorough study 
of the Institute’s publication experience and to make recommendations as to 
publication policies and editorial set-up. Other members of this committee are: 
W. G. Cochran, W. Feller, M. A. Girshick, J. Neyman, P. S. Olmstead, W. A. 
Wallis and J. Wolfowitz. The committee gave much thought and considera¬ 
tion to the problems involved and will report to the newly elected officers and 
Council. 

The Annals has developed under the leadership of the Editor, S. S. Wilks, 
to one of the outstanding professional journals. I am sure that I can speak for 
all our members in expressing the Institute’s indebtedness to S. S. Wilks for his 
untiring and most successful work. 

Committees. The problem of classification of statisticians in the Government 
service is naturally of considerable importance to the statistical profession. A 
committee consisting of W. E. Deming (chairman) and C. Eisenhart was ap¬ 
pointed to make a thorough study of this question with a view to advising the 
Civil Service Commission. The committee prepared a report in which three 
main categories of statisticians in Government Service arc distinguished: mathe¬ 
matical statisticians, statistical analysts and data-collecting statisticians. The 
report was transmitted to the Civil Service Commission with the approval of the 
Board of Directors. The members of this committee are to be commended for 
the excellent work they have done in spite of the severe limitation of time al¬ 
lotted by the Civil Service Commission. The work on the problem of classifica- 



REPORT OF THE PRESIDENT • 


159 


tion of statisticians still goes on and a committee of experts consisting of mem¬ 
bers of the Washington Statistical Society, the Institute of Mathematical Sta¬ 
tistics, and the American Statistical Association has been set up to advise the 
Civil Service Commission on this problem. Our representatives on this com¬ 
mittee of experts are: W. E. Deming, C. Eisenhart, M. H. Hansen and S. Weiss. 

The advances in numerical computations in recent years has made an enlarge¬ 
ment and reorganization of the Committee on Tabulation necessary. Its present 
members are: R. L. Anderson, C. Eisenhart (Chairman), A. M. Mood, F. 
Mosteller, H. G. Romig, L. E. Simon, and J. W. Tukey. The objectives of this 
committee, as outlined by the chairman are: (1) to prepare a comprehensive 
list of new mathematical tables that would be of value in statistical theory and 
applications, (2) to assemble an American Collection of “Tables for Statisti¬ 
cians”, (3) to prepare a list of mathematical tables of importance in statistical 
theory and applications to be recommended for inclusion in the proposed Na¬ 
tional Bureau of Standards volume of “Tables for the Occasional C’omputer”. 
To implement the program of the committee, the following sub-committees have 
been constituted: (1) “On Computing Centers” with L. E. Simon as Chairman, 
(2) “On Ranks and Runs” with A. M. Mood as C'hairman, (3) “On Serial Cor¬ 
relations” with R. L. Anderson as Chairman, (4) “On 2x2 Tables” with C. 
Eisenhart as Chairman, (5) “On Order Statistics” with F. Mosteller as Chair¬ 
man, (6) “On Binomial, Poisson, and Hypergeometric Distributions” with 
H. G. Romig as Chairman, (7) “On Miscellaneous Tables” with J. W. Tukey 
as Chairman, 

On the recommendation of the membership committee, consisting of H. 
Scheff6 (chairman), C, C. Craig, P. G. Iloel and F. F. Stephan, the following 
members have been elected as Fellows: J. Berkson, E. L. Lehmann, E. J. G. 
Pitman, H. E. Robbins and C. M. Stein. The members of the finance com¬ 
mittee for 1948 were P. S. D\\yer (chairman), C. F. Roos, L. A. Knowles and 
T. N. E. Greville. 

The Nominating Committee for 1948 consisted of W. Bartky (chairman), 
C. C. Craig, J. F. Daly, H. A. Freeman, E. L. Lehmann and W. G. Madow. The 
committee nominated J. Neyman for President, J. L. Dobb for President-Elect 
and 24 Council members for the 12 positions to be filled. In accordance with 
the provisions of the new constitution, the Nominating Committee for 1949 has 
also been appointed. The members of this Committee are: W. G. Cochran 
(Chairman), M. H. Hansen, H. B. Mann, A. M. Mood and H. G. Romig. 

The Board of Directors has been exploring the possibilities for a closer co¬ 
operation with our colleagues abroad and for making foreign statistical publica¬ 
tions more easily accessible to our members. In particular, there has been 
correspondance with Professor E. S. Pearson, Managing Editor of Biometrika, 
on the question of a possible reduction of the subscription rate of Biometrika 
for our members. As a result of these discussions. Professor Pearson offered 
certain reductions, provided that a sufficient number of subscribers can be se¬ 
cured. Detailed information on this was contained in a memorandum of the 



160 


REPORT OF THE SECRETARY-TREASURER 


Secretary, P. S. Dwyer, in the November mailing to the membership. It is 
hoped that many of our members will make use of this opportunity. 

With the new constitutions of the American Statistical Association and the 
Institute of Mathematical Statistics adopted, the way is cleared for the considera¬ 
tion of possible federation plans of the various statistical organizations by the 
Inter-Society Committee on Federation. J. H. Curtiss and P. S. Olmstead con¬ 
tinued to serve as our representatives on the aforementioned committee during 
1948. W. Feller was our representative on the Policy Committee for Mathe¬ 
matics, and F. C. Mosteller and S. S. Wilks represented the Institute on the 
Joint Committee for the Development of Statistical Application in Engineering 
and Manufacturing. W. Bartky was reappointed for a three-year term as our 
representative to the Division of the Physical Sciences of the National Research 
Council, and H. Hotelling was our representative to the American Association 
for the Advancement of Science. 

In conclusion, I wish to thank all committee members and others who par¬ 
ticipated in the work of the Institute during the past year. The heaviest burden 
falls, of course, on the Secretary and it is hard to express adequately our ap¬ 
preciation for his unselfish efforts and devotion. The smooth and efficient con¬ 
duct of the affairs of the Institute is largely due to his work. 

Abraham Wald 
President y 1948 

December 31, 1948 


REPORT OF THE SECRETARY-TREASURER OF THE INSTITUTE 

FOR 1948^ 

At the beginning of 1948 the Institute had 1037 members and during the 
period covered by this report 126 new members (13 of whom begin their mem¬ 
bership with 1949) joined the Institute and two members were re-instated. 
During 1948 the Institute lost 64 members of which 24 wore by resignation, 38 
by suspension for non-payment of dues and 2 by death. Judging from the 
information available at this date, the Institute will have 1101 members as it 
starts 1949. 

Deceased during the year were Dr. Otis A. Pope and H. M. Tompkins. 

Meetings of the Institute held during 1948 included those at Columbia Uni¬ 
versity on April 14-15, at the Berkeley campus of the University of California 
on June 22-24, at the University of Wisconsin on September 6-10, at the Uni¬ 
versity of Washington on November 26-27, and at Cleveland on December 
26-30. The Secretary wishes to call attention to the excellent work of the 
members who served as assistant secretaries at these meetings: Professor 
Littauer at New York, Professor Arnold at Madison, Professor Birnbaum at 
Seattle and Professor Van Voorhis at Cleveland. 

^ This report covers the period January 1, 1948 to December 20, 1948 as the books were 
closed on December 20,1948 so that the report could be made at the annual meeting. 



REPORT OP THE SECRETARY-TREASURER 


161 


A summary of the financial transactions of the Institute is given in 

the Financial Statement for 1948 which follows: 

FINANCIAL STATEMENT 
December 31 j 1947 to December 20, 1948 

A. RECEIPTS 

Balance on IIand,2 December 31, 1947. $5,858.37 

Dues. 7,482.21 

Contributions.. ... 255.50 

Subscriptions. 3,660.40 

Sale of Back Numbers— 2,718.27 

Income from Investments ... 100.00 

Advertising. 160.00 

Miscellaneous. 57.24 


Total. $20,291.99 

B. EXPENDITURES 

Annals—Current 

Office of the Editor. . $175.00 

Waverly Press ... 7,824.66 $7,999.66 


Annals—Back Numbers 

Reprinted Vol. XI ^2 & ^1^3; XII ^2 & ^3; XIV ^4 ... 1,968.50 

Mathematical Reviews and Inter-Society Committee . .. ... 225.00 

Office of the Secret ary-Treasurer 

Printing, memoranda, etc. (including some stamped enveloped) . 1,174.52 

Postage, supplies, express, telephone calls . 225.00 

Clerical help . 1,468.00 

Travelling Expense.. 30.48 2,898.00 

Miscellaneous. 79.82 

Balance on Hand,** December 20, 1948 . 7,121.01 


Total. $20,291.99 

C. SUMMARY OF RECEIPTS AND EXPENDITURES 

Balance on Hand,** December 31, 1947 . $5,858.37 

Receipts during 1948. 14,433.62 

Expenditures during 1948. 13,170.98 

Balance on Hand,** December 20, 1948 . 7,121.01 

** In bank deposits and government bonds. 

D. LIFE MEMBERSHIP FUNDS 

It has been the practice to place all life membership payments in a special fund (most of 
which is in government bonds) and to hold all these funds in reserve until the death of the 
member—after which his payment is released to the general fund. There were no new life 


* In bank deposits and government bonds. 













162 


REPORT OP THE SECRETARY-TREASURER 


membership payments in 1948. During the year a transfer to the general fund has been 
made of the life membership payment of Professor Irving Fisher, who died in 1947. 


December 
SI, 1947 

Number of Life Members. 30 

U. S. Government Bonds. $1,888.00 

Bank Deposits . 427.00 


December 
so, 1948 

29 

$ 1 , 888.00 

392.00 


Total 


$2,315.00 $2,280.00 


E. BACK ISSUES FUND 

It has been our policy, since January 1,1948, to use income from the sale of back issues 
to finance the additional reprinting of back issues. 


Income from the Sale of Back Issues during 1948 . $2,718.27 

Expense for Reprinting Back Issues in 1948. 1,968.50 


Balance in the Fund, December 20, 1948. $749.77 

At present 500 copies of Volume 13 1 and #2 are being reprinted at a cost of $735.00. 

The payment of this in January will leave a small balance in the fund. 


F. COMPARISON OF ASSETS ON DECEMBER 31, 1947 AND DECEMBER 20, 1948 


U. S. Government G Bonds. 

Life Membership Funds . 

Back Issues Fund. 

Additional Bank Deposits. 

Current Accounts Receivable. 

Estimated Value (Cost) of Back Annals’ 

Total . 

Net Gain 1948. 


1947 

1948 

$3,000.00 

$3,000.00 

2,315.00 

2,280.00 

— 

749.77 

543.47 

1,091.24 

423.55 

291.22 

10,866.73 

12,785.61 

$17,148.65. 

... $20,107.84 


... 3,049.19 


G. LIABILITIES OF INSTITUTE OF MATHEMATICAL STATISTICS AS OF DECEMBER 20, 1948 

All bills which have been presented have been paid. The Life Membership Fund now 
contains $2,280.00 which covers 29 members. Also, $4,060.50 has been paid in for con¬ 
tributions and 1949 dues and subscriptions. 

This report does not cover the amount of $13.95 which is held by the Institute 
for the fund for Annals for Countries Devastated by the War. (This fund has 
been under the supervision of Professor Neyman.) During the year this fund 
purchased $376.25 in back issues (at the agreed rate of $4.50 per volume) which 
has contributed to the total sales in back issues. 

There has been little change in the life membership fund during the year. 
Our practice of making no transfer of life membership funds until the death 
of the member is most conservative and protects the interests of the life member. 

The question of the value of our inventory is always difficult. We now have 
19,083 issues of the Annals. At 67ff per copy, it appears that $12,785.61 is a 
fair estimate of their actual cost. This is in fact less than 5 times the actual 


’ Cost of AnnaU calculated at 67 cents per copy. 



















REPORT OP THE EDITOR 


163 


income from back issues this year and hence seems to be a very conservative 
estimate of the marketable (within ten years) value of our present inventory. 

We are in a position now to continue to supply all issues beginning with volume 
7 and expect that the sales in back volumes will be such that within two or three 
years we will be able to reprint the 9 issues in volumes 1-6 which are now prac¬ 
tically or completely exhausted. 

It appears that the increase in dues and subscriptions has been adequate to 
take care of the increased expense during 1948. No bonds have been cashed 
during the year. Additional funds appear necessary for 1949, however, since 
the present amount of clerical help in the office of the Secretary-Treasurer is 
utterly inadequate. The employment of additional secretarial assistance, which 
the Institute must have, will increase the total expense of this office by about 
$1,200.00. It is necessary, too, to provide a cushion for a possible increase in 
our Waverly bill, which is up about 10% in 1948. It appears that we may 
need from $1,500.00 to $2,000.00 additional funds for 1949. Available sources 
are increases in the number of members and subscribers, contributions from 
our members, and institutional contributions and memberships. 

Paul S. Dwyer 
Secretary-Treasurer 

December 21, 1948 


REPORT OF THE EDITOR FOR 1948 

During 1948 the rate of submission of manuscripts for publication in the 
Annals has continued to increase. The size of the Annals was held approxi¬ 
mately to that set for 1947, the number of pages printed in 1948 being 610. 
The 1948 volume of the Annals contained 59 papers, of which 24 were short 
notes. 

During the past year the backlog of papers has increased to nearly two issues. 
Thus manuscripts submitted now, especially the longer ones, must wait at least 
six months after being refereed in order to be printed. If the rate at which 
manuscripts are submitted increases, as it has during the last two years, this 
waiting gap may increase to a year by the end of 1949. 

If additional funds could be found, it would be highly desirable to increase the 
Annals to 700 pages in 1949. 

The manuscripts being received continue to cover a rather wide range of 
topics in probability and statistics. Almost all of them are research papers. 
In the Editor’s opinion it would be highly desirable for the Institute to take steps, 
perhaps through invited addresses, to secure good expository and review articles. 
Sustained attempts have been made over a period of years to obtain such articles 
by invitation, but with little success. 

The Editor wishes to take this opportunity to acknowledge, on behalf of the 
Editorial Committee, the generous refereeing assistance which has been given by 



164 


REPORT OP THE EDITOR 


the following persons: Z. W. Bimbaum, A. H. Bowker, I. W. Burr, G. W. Brown, 
K. L. Chung, W. J. Dixon, A. Dvoretzsky, T. N. E. Greville, F. E, Grubbs, 
M. H. Hansen, T. E. Harris, C. Hastings, H. B. Horton, G. A. Hunt, B. F. 
Kimball, T. Koopmans, H. Levene, M. S. MacPhail, P. J. McCarthy, R. B. 
Murphy, M. P. Peisakoff, P. S. Olmstead, E. Paulson, H. G. Romig, L. J. Savage, 
F. F. Stephan, D. F. Votaw and J. E. Walsh. 

The Editor owes special acknowledgment to Mr. M. E. Freeman for prepara¬ 
tion of manuscripts and to Mrs. Frances M. Purvis for other editorial and office 
assistance. 

S. S. Wilks 
Editor 

December 31, 1948. 



STATISTICAL DECISION FUNCTIONS 
By Abraham Wald‘ 

Columbia University 

Introduction and summary. The foundations of a general theory of statistical 
decision functions, including the classical non-sequential case as well as the 
sequential case, was discussed by the author in a previous publication 
[3]. Several assumptions made in [3] appear, however, to be unnecessarily re¬ 
strictive (see conditions 1-7, pp. 297 in [3]). These assumptions, moreover, 
are not always fulfilled for statistical problems in their conventional form. In 
this paper the main results of [3], as well as several new results, are obtained 
from a considerably weaker set of conditions which are fulfilled for most of the 
statistical problems treated in the literature. It seemed necessary to abandon 
most of the methods of proofs used in [3] (particularly those in section 4 of [3]) 
and to develop the theory from the beginning. To make the present paper self- 
contained, the basic definitions already given in [3] are briefly restated in 
section 2.1. 

In [3] it is postulated (see Condition 3, p. 207) that the space Q of all admissible 
distribution functions F is compact. In problems where the distribution func¬ 
tion F is known except for the values of a finite number of parameters, i.e., where 
Q is a parametric class of distribution fimctions, the compactness condition will 
usually not be fulfilled if no restrictions are imposed on the possible values of the 
parameters. For example, if 0 is the class of all univariate normal distributions 
with unit variance, Q is not compact. It is true that by restricting the parameter 
space to a bounded and closed subset of the unrestricted space, compactness of 
0 will usually be attained. Since such a restriction of the parameter space can 
frequently be made in applied problems, the condition of compactness may not 
be too restrictive from the point of view of practical applications. Nevertheless, 
it seems highly desirable from the theoretical point of view to eliminate or to 
weaken the condition of compactness of This is done in the present paper. 
The compactness condition is completely omitted in the discrete case (Theorems 
2.1-2.5), and replaced by the condition of separability of H in the continuous 
case (Theorems 3.1-3.4). The latter condition is fulfilled in most of the conven¬ 
tional statistical problems. 

Another restriction postulated in [3] (Condition 4, p. 297) is the continuity 
of the weight function W{F, d) in F. As explained in section 2.1 of the present 
paper, the value of W(F, d) is interpreted as the loss suffered when F happens to 
be the true distribution of the chance variables under consideration and the 
decision d is made by the statistician. While the assumption of continuity of 
WiF, d) in F may seem reasonable from the point of view of practical applica¬ 
tion, it is rather undesirable from the theoretical point of view for the following 

* Work done under the eponsorehip of the Office of Naval Research. 

165 



166 


ABRAHAM WALD 


reasons. It is of considerable theoretical interest to consider simplified weight 
functions W (F, d) which can take only the values 0 and 1 (the value 0 corresponds 
to a correct decision, and the value 1 to a wrong decision). Frequently, such 
weight fimctions are necessarily discontinuous. Consider, for example, the 
problem of testing the hypothesis ff that the mean 9 of a normally distributed 
chance variable X with unit variance is equal to zero. Let di denote the decision 
to accept H, and dz the decision to reject H. Assigning the value zero to the 
weight W whenever a correct decision is made, and the value 1 whenever a 
wrong decision is made, we have: 

W(0, di) = 0 for J = 0, and = 1 for (? 0; IT {6, dz) = 0 for S ^ 0, 

and = 1 for 5 = 0. 

This weight function is obviously discontinuous. In the present paper the 
main results (Theorems 2.1-2.5 and Theorems 3.1-3.4) are obtained without 
making any continuity assumption regarding TF(F, d). 

The restrictions imposed in the present-paper on the cost function of experi¬ 
mentation are considerably weaker than those formulated in [3]. Condition 5 
[3, p. 297] concerning the class SI of admissible distribution functions, and condi¬ 
tion 7 [3, p. 298] concerning the class of decision functions at the disposal of 
the statistician are omitted here altogether. 

One of the new results obtained here is the establishment of the existence 
of so called minimax solutions under rather weak conditions (Theorems 2.3 and 
3.2). This result is a simple consequence of two lemmas (Lemmas 2.4 and 3.3) 
which seem to be of interest in themselves. 

The present paper consists of three sections. In the first section several 
theorems are given concerning zero sum two person games which go somewhat 
beyond previously published results. The results in section 1 are then applied 
to statistical decision functions in sections 2 and 3. Section 2 treats the case of 
discrete chance variables, while section 3 deals with the continuous case. The 
two cases have been treated separately, since the author was not able to find 
any simple and convenient way of combining them into a single more general 
theory. 

1. Conditions for strict determinateness of a zero sum two person game. 
The normalized form of a zero sum two person game may be defined as follows 
(see [1, section 14.1]): there are two players and there is a bounded and real 
valued function K(fl, b) of two variables a and b given where a may be any point 
of a space A and b may be any point of a space B. Player 1 chooses a point 
a in A and player 2 chooses a point b in B, each choice bemg made in complete 
ignorance the other. Player 1 then gets the amount K{a, b) and player 2 the 
amount —K(,a, b). Clearly, player 1 wishes to maximize K(a, b) and player 2 
wishes to minimize K{a, b). 

Any element a of A will be called a pure strategy of player 1, and any element 



STATISTICAL DECISION FUNCTIONS 


167 


h oi B a pure strategy of player 2. A mixed strategy of player 1 is defined as 
follows: instead of choosing a particular element a of A, player 1 chooses a 
probability measure ( defined over an additive class of subsets of A and the 
point a is then selected by a chance mechanism constructed so that for any 
element a of 31 the probability that the selected element a will be contained in 
a is equal to {(a). Similarly, a mixed strategy of player 2 is given by a probabil¬ 
ity measure i) defined over an additive class 33 of subsets of B and the element b 
is selected by a chance mechanism so that for any element of 33 the probability 
that the selected element b will be contained in /3 is equal to The expected 
value of the outcome K(a, b) is then given by 

(1.1) K*(£, u) = £ Kia, b) dr,. 

We can now reinterpret the value of K(a, b) as the value of K*(^a, ift) where 
and fjb are probability measures which assign probability 1 to a and b, respec¬ 
tively. In what follows, we shall write K(^, ri) for X*({, n), K{a, b) will be used 
synonymously with &), K(a, ij) synonymously with K(,^a, n) aad b) 
synonymously with K(i, rit). This can be done without any danger of confusion. 
A game is said to be strictly determined if 

(1.2) Sup Inf K(i, n) = Inf Sup Ki^, i,). 

If « t 

The basic theorem proved by von Neumann [1] states that if A and B are 
finite the game is always strictly determined, i.e., (1.2) holds. In some previous 
publications (see [2] and [3]) the author has shown that (1.2) always holds if one 
of the spaces A and B is finite or compact in the sense of some intrinsic metric, 
but does not necessarily hold otherwise. A necessary and sufiScient condition 
for the validity of (1.2) was given in [2] for spaces A and B with countably many 
elements. In this section we shall give sufficient conditions as well as necessary 
and sufficient conditions for the validity of (1.2) for arbitrary spaces A and B. 
These results will then be used in later sections. 

In what follows, for any subset a of A the symbol will denote a probability' 
measure £ in A for which ((a) 1. Similarly, for any subset jS of & will stand 
for a probability measure ij in B for which ijOS) => 1. We shall now prove the 
following lemma. 

Lehoca 1.1. Let {a<} (i — 1, 2, • • • , ad inf.) be a sequence of subsets of A 
such that «< C Ui+i and let a ^ £<-»“• • Then 

(15) lim Sup Inf K(^ai , n) = Sup Inf K(i „, ij). 

»■-«' V (a 1 

Pkoof: Clearly, the limit of Sup Inf K(£a< , v) exists as » —> <» and cannot 

(at t, 

exceed the value of the right hand member in (1.3). Put 
(1.4) lim Sup Inf K((at ,v) = P 

(ai n 



168 


ABRAHAM WAtD 


and 

(1-5) Sup Inf K((a , n) “ P + 5 (5 > 0). 

Suppose that S > 0. Then there exists a probability measure such that 

(1.6) f ,+for all n. 

Let {ai be the probability measure defined as follows: for any subset a* of 
we have 

(1.7) = 

{a(aj 

Then, since lim (a — aO = 0 , we have 

{moo 

(1.8) limJfdi,,,) = !(:({»,,) 

uniformly in ri. Hence, for sufficiently large i, we have 

( 1 . 9 ) MK(e„,r,)>P + \, 

which is a contradiction to (1.4). Thus, i = 0 and Lemma 1.1 is proved. In* 
terchanging the role of the two players, we obtain the following lemma. 

Lebou 1.2. Let 0,) be a sequence of svbsets of B such that jS. C and let 
ZT-i = i®. Then 

.(1.10) lim Inf Sup m, ij,,) = Inf Sup m, ns)- 

Hi I ntf { 

We shall now prove the following lemma. 

Lemma 1.3. The inequaiitj^ 

(1.11) Sup Inf K{^, ij) < Inf Sup X(f, 17 ) 

{ * « ( 

dlvsays holds. 

Fboot: for any given e > 0, it is possible to find probability measures f and 
V* such that 

(1.12) Inf Sup m, ij) ^ Sup Kii, n®) - « 

« « ( 

and 

(1.13) Sup Inf Kii, i») g Inf Kif, n) + «. 

( f f 

«Hiis inequality was given by v. Neumann [1] for finite spaces A and B. 



STATISTICAL DECISION FUNCTIONS 


169 


Then we have, 

(1.14) Sup Inf m, v) < Inf Kif, n) + « < K(t, i,®) + * 

I ^ f 

< Sup v) + f ^ Inf Sup K{^, ti ) + 2 e. 
I » i 

Since c can be chosen arbitrarily small, Lemma 1.3 is proved. 

Theorem 1.1. If a is a subset of A such that 

Sup Inf K(ia, If) = Inf Sup K{^„, if) 

(a f f (a 

■and 

Inf Sup If) = Inf Sup if), 

n U f 

then 

Sup Inf K(i, If) = Inf Sup K(i, if). 

I « n t 

Proof: Clearly, 

(1.16) Sup Inf K(ia , If) < Sup Inf KH if) 

{a f t « 

and 

(1.16) Inf Sup ma , If) < Inf Sup X({, if). 

* ta it 

If the left hand members of (1.15) and (1.16) are equal to each other and 
equal to the right member of (1.16), then 

(1.17) Sup Inf If) > Inf' Sup K{,1 if). 

t 1 it 

Because of Lemma 1.3 the equality sign must hold and Theorem 1.1 is proved. 
Interchanging the two players, we obtain from Theorem 1.1: 

Theorem 1 .2. If fits a subset of B such that Sup Inf K(^, ne) — Inf Sup ^(|, iff) 

t If If t 

and Sup Inf X(f, iff) = Sup Inf if), 

t If t 1 

then 

Sup Inf X((, If) B Inf Sup if). 

t 1 It 

We dhall now prove the following theorem. 

Theorem 1.3. If {ou} is a sequence of subsets of A such that C a{^i and 

«e 

^ at 1 " A, and if 

i-i 

(1.18) Sup Inf Kiiat , If) - Inf Sup , if) 



170 


ABRAHAM WALD 


for each i, then a necessary and sufficient condUicn for the validity of 


(1.19) 

Sup Inf ij) = Inf Sup m, 17 ) 

in n i 

is that 


( 1 . 20 ) 

lim Inf Sup , 17 ) = Inf Sup K(^, 17 ). 


<■«» » taj » f 


Proof: Because of (1.18) aud Lemma 1.1 we have 

(1.21) Urn Inf Sup , i?) = Sup Inf m, n). 

n Ui t 1 

Hence, (1.20) implies (1.19) and (1.19) implies (1.20). This proves Theorem 1.3. 

Interchanging the role of the two players, we obtain from Theorem 1.3 the 
following theorem. 

Theorem 1.4. If {j3,} is a sequence of subsets of B such that j3{ C j9<+i and 

00 

2 ft = and if 

Sup Inf X({, tifti) = Inf Sup iiC({, *;/».), 

i Vj t 

then a necessary and sufficient condition for the validity of (1.19) is that 

(1.22) lim Sup Inf tif,) = Sup Inf K(l fil 

In [3] an intrinsic metric was introduced in the spaces A and B. The distance 
of two elements Oi and os of A is defined by 

(1.23) 5(ai, at) = Sup 1 K{ai , b) - K(at , b) |. 

6 

Similarly, the distance between two points h and bi of B is defined by 

(1.24) «( 6 i, bt) = Sup I K(a, bi) - X(o, bt) \. 

a 

Suppose that there exists a sequence {a,-} of subsets of A such that a< is con- 

00 

ditionally compact, ai C a,+j and ^ Ui = A.* It was shown in [3] that for 

any conditionally compact subset a,- the relation (1.18) holds. Hence, according 

to Theorem 1.3, a necessary and sufficient condition for the validity of (1.19) 

is that ( 1 . 20 ) holds for a sequence {a<} where a,- is conditionally compact, 
00 

Oi C a{ 4 .i and ^ a. = A. Similar remarks can be made concerning the space B. 

<••1 

The distance definitions given in (1.23) and (1.24) can be extended to the spaces 
of the probability measures { and 17 , respectively. That is, 

( 1 . 25 ) «({,, fe) = Sup I m, v) - m, v) 1 

* For a definition of compact and conditionally compact sets, see F. Hausdorff, MengenUhre 
(3rd edition), p. 107, or [3, p. 296]. 



STATISTICAL DECISION FUNCTIONS 171 

and 

(1.26) Sivi, m) = Sup I m, 1,0 - m, ,0 1. 

t 

We shall say that a probability measure {is discrete if there exists a denumer¬ 
able subset a of A such that $(«) = 1. Similarly, a probability measure i, will 
be said to be discrete if ),(/3) => 1 for some denumerable subset /3 of B. We shall 
now prove the following theorem. 

Theorem 1.5. If the choice of player 1 is restricted to elements of a class C of 
proboMlUy measures ( in which, the class of all discrete probability measures ( is 
dense, then a necessary and sufficient condition for the game to be strictly determined 
is that there exists a sequence {aO of elements of A swh that 

(1.27) lim Inf Sup Ki^at, v) = Inf Sup XCf, i,) 
where 

— {oi , Ot, • • • , a,}. 

Proof: Since the class of all discrete probability measures | lies dense in the 
class C, there exists a sequence a = {ojj (i — 1 , 2 , • • • , ad inf.) 
such that 

(1.28) Sup Inf , 1 ,) = Sup Inf iiC({, i,). 

ta « t « 

Since a,- = {oi, • • • , a<} is finite, we have 

(1.29) Inf Sup Kiiai, v) = Sup Inf , u). 

< Ui (ai 4 

It then follows from Lemma 1.1 that 

(1.30) lim Inf Sup !?($„<, i,) = Sup Inf K({«, i,) = Sup Inf K(i, i,). 

f la f I V 

Clearly, (1.30) and strict determinateness of the game implies (1.27). On the 
other hand, any a = {aO that satisfies (1.27), will satisfy also (1.28) and (1.30). 
But (1.27) and (1.30) imply that the game is strictly determined. Thus, 
Theorem 1.5 is proved. 

Theorem 1.6. If the choice of player 2 is restricted to elements of a doss C of 
probability measure i, in which the doss of all discrete probability measures y lies 
dense, then a necessary and sufficient condition for the strid determinateness of the 
game is that there exists a sequence 0 » {&.} of elements of B such that 

(1.31) lim Sup Inf K{^, tiff) = Sup Inf K{!^, n) 

<-• ( t * 

where 

0i = {bi, ,bi}. 

This theorem is obtained from Theorem 1.5 by interchanging the players 
1 and 2 . 



172 


ABRAHAM WALD 


2. Statistical dedsion functions: the case of discrete chance variable. 

2.1. The problem of statieticd decisum and its interpretation as a zero sum twa 
person game. In some previous publications (see, for example, [3]) the author 
has formulated the problem of statistical decisions as follows: Let X » {X*} 
(i = 1,2, • • • , ad inf.) be an infinite sequence of chance variables. Any particu¬ 
lar observation x on is given by a sequence x = {x‘) of real values where x* 
denotes the observed value of X\ Suppose that the probability distribution 
F(x) of X is not known. It is, however, known that F is an element of a given 
class Q of distribution functions. There is, furthermore, a space D given whose 
elements d represent the possible decisions that can be made in the problem 
under consideration. Usually each element d ol D will be associated with a 
certain subset co of Q and making the decision d can be interpreted as accepting 
the hypothesis that the true distribution is included in the subset to. The funda¬ 
mental problem in statistics is to give a rule for making a decision, that is, a 
rule for selecting a particular element d of D on the basis of the observed sample 
point X. In other words, the problem is to construct a function d(x), called 
decision fimction, which associates with each sample point x an element d(x) 
of Z> so that the decision d(x) is made when the sample point x is observed. 

This formulation of the problem includes the sequential as well as the classical 
non-sequential case. For any sample point x, let n(x) be the number of com¬ 
ponents of X that must be known to be able to determine the value of d(x). In 
other words, n(x) is the smallest positive integer such that d(y) = d(z) for any y 
whose first n coordinates are equal to the first n coordinates of x. If no finite 
n exists with the above property, we put n = «, Clearly, n(x) is the number 
of observations needed to reach a decision. To put in evidence the dependence 
of n(x) on the decision rule used, we shall occasionally write n(x; !D) instead of 
.n(x) where ^ denotes the decision function d(x) used. If n(x) is constant over 
the whole sample space, we have the classical case, that is the case where a 
decision is to be made on the basis of a predetermined number of observations. 
If n(x) is not constant over the sample space, we have the sequential case. A 
basic question in statistics is this: What decision function should be chosen by 
the statistician in any given problem ? To set up principles for a proper choice of 
a decision function, it is necessary to express in some way the degree of im¬ 
portance of the various wrong decisions that can be made in the problem under 
consideration. This may be expressed by a non-negative function W(F, d), 
called weight functions, which is defined for all elements F of 0 and all elements 
d of D. For any pair (F, d), the value W(F, d) expresses the loss caused by 
making the decision d when F is the true distribution of X. For any positive 
integer n, let c(n) denote the cost of making, n observations. If the decision 
fimction S) = d(x) is used the expected loss plus the expected cost of experi¬ 
mentation is given by 

rlF,®]- f WlF,d(x)]dF(x)+ f c(n(x))dF(x) 

Jm Jm 


( 2 . 1 ) 



STATISTICAL DECISION FUNCTIONS 


173 


where M denotes the sample space, i.e. the totality of all sample points x. We 
shall use the symbol ^ for d(x) when we want to indicate that we mean the whole 
decision function and not merely a value of d(x) coresponding to some 

The above expression (2.1) is called the risk. Thus, the risk is a real valued 
non-negative function of two variables F and ® where F may be any element 
of Q and ^ any decision rule that may be adopted by the statistician. 

Of course, the statistician would like to make the risk r as small as possible. 
The difficulty he faces in this connection is that r depends on two arguments F 
and 2), and he can merely choose 2) but not F. The true distribution F is chosen, 
we may say, by Nature and Nature’s choice is usually entirely unknown to the 
statistician. Thus, the situation that arises here is very similar to that of a 
zero sum two person game. As a matter of fact, the statistical problem may be 
interpreted as a zero sum two person game by setting up the following corres¬ 
pondence: 


Two Person Game 


Statistical Decision Problem 


Player 1 
Player 2 

Pure strategy o of player 1 
Pure strategy 6 of player 2 
Space A 
Space B 

Outcome K{a, b) 

Mbced strategy $ of 
player 1 

Mixed strategy n of 
player 2 

Outcome K(5, ij) when 
mixed strategies are 
used. 


Nature 

Statistician 

Choice of true distribution F by Nature 

Choice of decision rule ® == d(x) 

Space Q 

Space Q of decision rules ® that can be used by 
the statistician. 

Risk r(F, D) 

Probability measure { defined over an additive 
class of subsets of Q (a priori probability dis¬ 
tribution in the space Q) 

Probability measure n defined over an additive 
class of subsets of the space Q. We shall refer 
to n as randomized decision function. 

Riskr(f,ij) =» f f r(F,t>)d$dv. 

Jq Jq 


2.2. Formulalwn of some conditions concerning 0ie spaces Q, D, the weight func¬ 
tion W (F, d) and the cost function of experimentation. A general theory of statis¬ 
tical decision functions was developed in [3] assuming the fulfillment of seven 
conditions listed on pp. 297-8.^ The conditions listed there are unnecessarily 
restrictive and we shall replace them here by a considerably weaker set of con¬ 
ditions. 

In this chapter we shall restrict ourselves to the study of the case where each 
of the chance variables * • *, ad inf. is discrete. Weshall say that a chance 

* In [8] only the oontinuous ease is treated (existence of a density function is assumed), 
but all the results obtained there can be extended without difficulty to the discrete case. 



174 


ABRAHAM WALD 


variable is discrete if it can take only countably many different values. Let 
aa,aa, • • • , ad inf. denote the possible values of the chance variable X*. Since 
it is immaterial how the values are labeled, there is no loss of generality in 
putting Oi/ * J(j - 1, 2, 3, • • • , ad inf.). Thus, we formulate the following 
condition. 

Condition 2.1. T/ie chance variable X* (i 1, 2, ■ * * , od inf.) can take only 
positive integral values. 

As in [3], also here we postulate the boundedness of the weight fimction, i.e., 
we formulate the following condition. 

Condition 2.2. The weight function W (F, d) is a hounded function of F and d. 
To formulate condition 2.3, we shall introduce some definitions. Let w be a 
given subset of S. The distance between two elements di and dtolD relative to 
u is defined by 

(2.2) «(di, dj; w) = Sup 1 W(F, di) - Tr(F, di) |. 

Ftu 

We shall refer to d(di, da; Q) as the absolute distance, or more briefly, the dis¬ 
tance between di and da. We shall say that a subset D* of D is compact (con¬ 
ditionally compact) relative to co, if it is compact (conditionally compact) in 
the sense of the metric j(di, da; w). If D* is compact relative to Q, we diall 
say briefly that D* is compact. 

An element d of D is said to be uniformly better than the element d' of D rela¬ 
tive to a subset o) of Q if 

W(F, d) ^ W{F, d') for all F in « 

and if 


Tr(F, d) < W{F, d') for at least one F in w. 

A subset D* of D is said to be complete relative to a subset u of if for any d 
outside D* there exists an element d* in D* such that d* is uniformly better than 
d relative to w. 

Condition 2.3. For any positive integer i and for any positive t there einsts a 
subset Di,t of D which is compact relative to Q and complete relative to co.-,. where 
Ui., is the doss of all elements F of Q for which proh {X* g t) ^ 

If i) is compact, then it is compact with respect to any subset u of Q and Ckm- 
dition 2.3 is fulfilled. For any finite space D, Condition 2.3 is obviously ful¬ 
filled. Thus, Condition 2.3 is fulfilled, for example, for any problem of testing 
a statistical h 3 ^othe 8 is H, since in that case the space D contains only two ele¬ 
ments di and dt where di denotes the decision .to reject H and dt the decision to 
accept H. 

In [3] it was assumed that the cost of experimentation depends only on the 
number of observations made. This assumption is imnecessarily restrictive. 
The cost may depend also on the decision rule ® used. For example, let Sh 
and ^ be two deciaon rules such that n(x; S)i) is equal to a constant no, while 



STATISTICAL DECISION FUNCTIONS 


175 


^ is such that at any stage of the experimentation where requires taking at 
least one additional observation the probability is positive that experimentation 
will be terminated by taking only one more observation. Let x" be a particular 
sample point for which n(x*; I)j) = n(x®, ®i) = no. There are undoubtedly 
cases where the cost of experimentation is appreciably increased by the necessity 
of having to look at the observations at each stage of the experiment before we 
can decide whether or not to continue taking additional observations. Thus 
in many cases the cost of experimentation when x** is observed may be greater 
for ^2 than for . The cost may also depend on the actual values of the ob¬ 
servations made. Thus, we shall assume that the cost c is a single valued func¬ 
tion of the observations x\ • • • , x" and the decision rule 3) used, i.e., c = 
c{x\ • • • , x", 2)). 

Condition 2.4. The cost c(x*, • • • , x", 35) is non-negative and lim 
c(x', • • • , x“, 35) = «> uniformly in x^, , x”,^ as m «. For each pos¬ 

itive integral value m, there exists a finite value Cm, depending only on m, such 
that c(x*, • • • , x", 2)) ^ Cm identically in x\ • • • , x", 2). Furthermore, 
c(xS • • • , x", ®i) = c(x‘, • • • , x”, 2 ) 2 ) if n(x; 35i) = n(x; 252) for all x. Finally, 
for any sample point x we have c(x‘, • • • , x"^*’®*\ 2)i) ^ c(x*, • •• , x"^*'®*’, 2 ) 2 ) 
if there exists a positive integer m such that n(x, 35i) n(x, 2 ) 2 ) when n(x, 2 ) 2 ) < m 
and nix, 25i) * m when nix, 2 ) 2 ) ^ m. 

2.3 AUemative definition of a randomized decision function, and a further con¬ 
dition on the cost function. In Section 2.1 we defined a randomized decision 
function as a probability measure n defined over some additive class of subsets 
of the space Q of all decision functions d(x). Before formulating an alternative 
definition of a randomized decision function, we have to make precise the mean¬ 
ing of n by stating the additive class Cq of subsets of Q over which n is defined. 
Let Cb be the smallest additive class of subsets of D which contains all subsets 
of D which are open in the sense of the metric Sidi ,dt ',Q). For any finite set of 
positive integers Oi, • • • , 0 * and for any element D* of Cb , let Q(ai at, 
D*) be the set of all decision fimctions d(x) which satisfy the following two con¬ 
ditions: (1) If X* = Ox, X® = 02 , • • • , X* = a* , then n(x) = fc; (2) If x' = Oi, • • • , 
X* => ojt, then d(x) is an element of D*. Let C% be the class of all sets Q(ai, 
• • • , o* , D*) corresponding to all possible values of fc, oi, • • • , o* and all pos¬ 
sible elements D* of Cb • The additive class Cq is defined as the smallest 
additive class containing as a subclass. Then with any n we can associate 
two sequences of functions 

{Zm(x\...,x"l,)| 

and 

1 ij)}(ot = 1 , 2, • • • , ad inf.) 

where 0 ^ h) ^ 1 and for any x\ • • • , x", is a prob¬ 

ability measure in D defined over the additive class Cb • Here 

Zm(®*, • • • , X" I tj) 



176 


ABRAHAM WALD 


d^otes the conditional probability that n(x) > m under the condition that 
the first m observations are equal to x', ■ • • , x” and experimentation has not 
been terminated for (x*, • • • , x*) for (h 1 , 2 , • • • , m — 1 ), while 

I n) 

is the conditional probability that the final decision d will be an element of D* 
under the condition that the sample (x\ ■ ■ • , x") is observed and n(x) » m. 
Thus 


«i(x‘ I X* I ij) • • • 2«_i(x‘, • • • , X* ‘ 1 1;) [1 - z„(x\ • • • , x" I ij)] » 


(2.3) 


r,mx\^--.x^,D] 


and 


(2.4) 


I,) 


,[Q(x\..-.x".D*)l 
n[Qix\ • • • , x«, D)] ’ 


We diall now consider two sequences of functions { 2 «(x\ ••• , x")} and 
{5,i...,i»(D*) j, not necessarily generated by a given ij. An alternative definition 
of a randomized decision function can be given in terms of these two sequences 
as follows: After the first observation x^ has been drawn, the statistician deter¬ 
mines whether or not experimentation be continued by a chance mechanism 
constructed so that the probability of continuing experimentation is equal to 
2 i(x'). If it is decided to terminate experimentation, the statistician uses a 
chance mechanism to select the final decision d constructed so that the prob- 
abfiity distribution of the selected d is equal to 2 .i(Z>*). If it is decided to take 
a second observation and the value x^ is obtained, again a chance mechanism is 
used to determine whether or not to stop experimentation such that the prob¬ 
ability of taking a third observation is equal to zi{x^, x^). If it is decided to stop 
experimentation, a chance mechanism is used to select the final d so that the 
probability distribution of the selected d is equal to and so on. 

We shall denote by f a randomized decision function defined in terms of two 
sequences { 2 «(x*, ••• , x")} and {5*i...»»(i>*)}, as described above. Clearly, 
any given q generates a particular ^ Let f'(n) denote the f generated by i;. 
One can easily verify that two different n’s may generate the same f, i.e., there 
exist two different ri% say in and m such that f (in) f(in). 

We shall now show that for any j* there exists an n such that r(n) "■ f* Let 
f be given by the two sequences iz»(x‘, • • • , x”*)} and {<•>...<*•( 0 *)}. Let b/ 
denote a sequence of rj podtive integers, i.e., » (bn >' * * > ^/.r/) 0 L 2 , ■ * *, k) 

subject to the restriction that no 6 / is equal to an initial segment of bt(j ^ 0 * 
Let, furthermore, Pf, * • • , P* be h elmnents oi Co • Finally, let Q(bi, * > •, 
h*, Pi , * ■ * , P*) denote the class of all decision functions d(x) wh^ satisfy 



STATISTICAL DECISION FUNCTIONS 


177 


the following condition: K (*', •••, = hj then n(x) and d(x) is an ele¬ 

ment oi Dj(j = 1, • • • ,k). Let ij be a probability measure such that 

vmbi , • • • , 6* , Df , •.. , D?)] 

(2.6) = «6*(z>r)n n n •••n 

m—l *"•—1 a^>«l 

. {z„ix\ • • • , xT"'**""’*"’!! - z.n(x\ •. • , x”)]'-“‘--"’} 
holds for all values of k, bi, ••• , 6* , D* , • • • , D* . Here ••• ,x”) = 
1 if (x*, • • • , x") is equal to an initial segment of at least of one of the samples 
bi, • • • , bk , but is not equal to any of the samples 6i, • • • , 6* . In all other 
cases fif«(x*, • • • , x”) = 0. The function , x”) is equal to 1 if 

(x*, • • • , x") is equal to one of the samples 6i, • • • , 6*, and zero otherwise. 
Clearly, for any ij which satisfies (2.5) we have f (i;) = f. The existence of such an 
ij can be shown as follows. With any finite set of positive integers ii, , i, 
we associate an elementary event, say ^4,(ti, • • • , tV). Let Ariii , • • • , v) 
denote the negation of the event 4,(ii, • • • , iV). Thus, we have a denumerable 
system of elementary events by letting r, t'l, • • • , v take any pomtive integral 
values. We shall assume that the events .4i(l), Ai(2), • • • , ad inf. are inde¬ 
pendent and the probability that Ai{i) happens is equal gi(i). We shall now 
define the conditional probability of Ai{i, j) knowing for any k whether Ai{k) 
or Ai{k) happened. If i4i(i) happened, the conditional probability of A 2 {i, j) — 
zt(i, f) and 0 otherwise. The conditional probability of the joint event that 
.4j(ti, ji), At{ii , jj), • • • , ^i(f,, jr), Ai{ir+i , jr+i), • • • , and i2(ir+., j,+,) will 
happen is the product of the conditional probabilities of each of these events 
(knowing for each i whether .4i(t) or Ai{i) happened). Similarly, the condi¬ 
tional probability (knowing for any t and for any (t, j), whether the correspond¬ 
ing event At{i, j) happened or not) that At{ii , ji , ki) and il}(i2, jt , h) and 
• • • A%{^r , jr , A»r) and ^i(tf+i, jr+i , ^r+i) 8^d • * • and Jr+f, will 

simultaneouaiy happen is equal to the product of the conditional probabilities 
of each of them. The conditional probability of Azij,, j, k) is equal to zt{i, j, k) 
if ili(t) and Ai(t, j) happened, and zero otherwise; and so on. Clearly, this 
aystem of probabilities is consistent. 

If we interpret Ar(fi , • • • , ir) as the event that the decision function ^ = 
d(x) selected by the statistician has the property that n(x; S)) > r when x* = 
t'l, ’ * ■ , x' = ir, the above defined system of probabilities for the denumerable 
sequence {Ar(ii, ••• , i,)) of events implies the validity of (2.5) for D* »= 
D(j' = 1, • • • , fc). The consistency of Ae formula (2.5) for D* = D implies, 
as can easily be verified, the consistency of (2.5) also in the general case when 
^ D. 

Let he given by the sequences of {zi».(x, • • • , x")) and (m * 

1, 2, ‘ , ad inf.). Let, furthermore, f be given by {««(x‘, • • • , x*)} and 

(^1...«»}. We say that 

(2.6) 



178 


ABRAHAM WAU) 


if for any m, , x* we have 

<2.7) limgmi(x\ • • • , X**) = ««(x‘, • • • , ®") 

and 

<2.8) lim = S.i...sm(D*) 

for any open subset D* of D whose boundary has probability measure sero ao> 
cording to the limit probability measure 
In addition to Condition 2.4, we shall impose the following continuity con* 
dition on the cost function. 

CoNDmoH 2.5. 1/ 

limf(ij,) = {•(ij), 


then 


lim j c(x*, • • • , x", ©) d^i — J c(x‘, • • • , x", J)) dif. 

«(»*.. 

Inhere Q(x\ ■ * • , x") is iAe class of aU decision functions S for which n(y,!{)) 
mif ,y” =‘ x". 

2.4. The main theorem. In this section we shall show that the statistical 
decision problem, viewed as a zero sum two person game, is strictly determined. 
It will be shown in subsequent sections that this basic theorem has many im¬ 
portant consequences for the theory of statistical decision functions. A precise 
formulation of the theorem is as follows: 

> Theorem 2.1. If Conditions 2.1-2.5 are fvlfiUed, the decision problem, viewed 
as a zero sum two person game, is strictly determined, i.e., 

(2.9) Sup Inf r(f, ij) = Inf Sup r(f, ij). 

( n * I 

To prove the above theorem, we shall first derive several lemmas. 

Leboia 2.1. For any t > 0, there exists a positive integer m,, depending only 
on t, siuh that the value of Sup Inf r ((, ri), is not (hanged by more than c if we re- 

£ V 

strict the choice of the statistician to decision functions d(x) for which n(x) ^ m, 
for all X. 

Proop: Put TTo = Sup W(,F, d) and choose m, so that 

r,D 

(2.10) c(x‘, ••• ,x",S)):> ^ 

identically in x\ ■ ■ ■ , x” and S) for all m ^ m,. The existence of such a value 
m. follows from Condition 2.4. Consider the function Inf r({, ^). Our lemma 

is proved, if we can show that for any |, the value of Inf r({, SD) is not increased 



STATISTICAL DECISION FUNCTIONS 


179 


by more than e if we restrict 35 to be such that nix, 2)) < m, for all x. The latter 
statement is proved, if we can show that for any decision function 35i =» di(x) 
we can find another decision fimction S5j = dj(x) such that n(x, S5j) < m, for 
all X and r({, 3)i) < r(f, 35i) + <. There are two cases to be considered: (a) 
prob {»(X, 3)i) > m, I {} ^ e/Wt and (b) prob {n(X, S)i) > m, 1 f} < e/Wo. 
In case (a) we have r((, ^i) g iTo. In this case we can choose ^ to be the rule 
that we decide for some element do of D mthout taking any observations. 
Clearly, for this choice of 35j we shall have r({, 3)2) < r({, 3)i). In case (b) 
we choose O 2 as follows: 

dt(x) = di(x) whenever n(x, 3)i) ^ m,; 

diix) = do whenever n(x, 2)i) > m ,, 

where do is an arbitrary element of D. Thus, n(x, 3)2) ^ m, for all x. Since 
prob {n(x, 3)i) > m, | {) < «/Wo, it is clear that r({, 3)2) ^ r(f, 3)i) + t. Hence 
our lemma is proved. 

Let Q” denote the class of decision functions 3) for which n(x; 35) ^ m for 
all X. For any positive €, let Q”’* denote the class of all decision fimctions 
which satisfy the following two conditions simultaneously: (1) n(x, 35) ^ m for 
all x; (2) d(x) is an element of where Dti,t denotes the subset of D having 
the properties stated in Condition 2.3. Clearly, Q”'* C Q", A probability 
measure ti will be denoted by if ij(Q”) = 1, and by ij"*’* if niQ"'*) = 1. 
Lemma 2.2. The foUovoing inequality holds; 

(2.11) Sup Inf r({, ij") ^ Sup Inf r(f, i>” *) g Sup Inf r({, ij") + e TTo, 

t 2" { 2«'« t 2" 

where TFo is an upper hound of TF(F, d). 

Proof: The first half of (2.11) is obvious. If we replace the subscript x‘ by 
the chance variable the set Wsi., defined in Condition 2.3 will be a random 
subset of Q. It follows easily from the definition of Wsi.. that 

(2.12) prob {F«a»xi.2|F} S 1 - 

With any decision function 35 d(x) we shall associate another decision func¬ 
tion 3)* = d*(x) such that n(x, 3)) = n(x, 3)*); d*(x) = d(x) whenever d(x) c 
DJi., ; and d*(x) is an element of D^i,. that is uniformly better than d(x) 
relative to Wxiwhenever d(x) < Dti . It follows from (2.12) and the fact that 
TTo is an upper bound of W(F, d) that 

(2.13) r(F, 3)*) ^ r(F, 3)) + « Wo. 

The second half of (2.11) is an immediate consequence of (2.13) and our lemma 
is proved. 

T.tomma 2.3. The equation 

(2.14) Sup Inf r(5, ij” ') = Inf Sup r({, ij”* ') 
hcidsforall m and t. 



180 


ABBAHAM WALD 


Proof: For any positive integral values m, k and for any p > 0, let 0"*’*’' be 
the class of all elements F of 0 for which 

prob ^ k and ^ k and ^ A:} ^ 1 — p. 

A probability measure { for which {(fl"’*’') = 1 will bo denoted by To 

prove (2.14), we shall first prove the inequality 

(2.15) I Sup Inf v”'‘) - Inf Sup v”"'') I ^ p(Wo + CJ 

where Cm is an upper bound of C(x\ • • • , x', 35) for all r ^ m,x\ ,x' and 35. 

Since for any d(x') in Q”’’, d(x) must be an element of D*i,, and since D2 i,« 
is compact, it is sufficient to prove the validity of (2.14) in the case when 
is a finite set. Thus, we shall assume in the remainder of the proof that Dst,, 
is finite. 

Let 5 be a given positive number and let Q”-*'* be a finite subset of Q“’* satis¬ 
fying the following condition: for any element 3) >= d(a;) in Q*”'* there exists an 
element 35* = d*(a:) in Q"'*'* such that 

d*(x) = d{x) and 1 C(®, 35*) - C(®, 35) 1 g « 

for all X for which x^ ^ k, x* ^ k, • • • , and a:” ^ k. Clearly, for any choice of 
S there exists a finite subset Q"**'* of Q"'* with the desired property. For any 
35 in C"’*, we can then find an element 3)* in Q"’*’* such that 

r(F, 35*) ^ r(F, 35) + p(Tr. -f Cm) + S, 

for all F in fl"'*'*. From this it follows that 

(2.16) Sup Inf r ^ Sup Inf r g Sup Inf r + p(IFo + C«) -)- 8 

I I}***!* i|M,| 

(2.17) Inf Sup r g Inf Sup r ^ Inf Sup r -|- p(Tro + Cm) + 8 

(in,k,p ^m,k,p 

where = i. Since Q"’*’* is finite, we have 

(2.18) Sup Inf r == Inf Sup r. 

^m,k,p iftfi.fc.f ^m,k,p 

Inequality (2.15) follows from (2.16), (2.17) and (2.18) and the fact that 8 
can be chosen arbitrarily small. 

Lemmas 1.1., 1.3 and the inequality (2.15) imply that Lemma 2.3 must hold 
if 

(2.19) lim Inf Sup r = Inf Sup r 

Jb^eo ( 

holds. Thus, the proof of Lemma 2.3 is completed if we can show the validity 
of (2.19). 

Let ink'*} (A; » 1,2, • • • , ad inf.) be a sequ^ce of randomised decimon func¬ 
tions such that 

(2.20) lim [Sup f(^■*’^ n*'*) - Inf Sup *= 0. 

kmto (m,k,p 



STATISTICAL DECISION FUNCTIONS 


181 


Let ft » tint'*) (see definition in Section 3.2) and let be pven by the two 
sequences of functions ••• , *')} {5 »‘—(t = 1» 2, ••• , m). 

Since there are only countably many samples («*, ••• ,x^) (r ^ m), there exists 
a subsequence {A;*} of the sequence {k\ such that 

(2.21) lim Zrjtiix^, • • • , a;’’) = 2 ,(x*, • • • , i') 
and 

(2.22) lim *.•x*',i;i ~ 

*•■00 

for all r and all samples (x\ • • • , x'). Let rfo'' be a randomized decision func¬ 
tion such that f(ijo’*) is equal to the f defined by {2r(x*, • • • , x')} and {^xi'-.x'} 
(r = 1, 2, • • • , m). 

For any element F of ft and for any i* > 0, there exists a finite subset M, of 
the m-dimensional sample space such that the probability (under F) that the 
sample (x*, • • • , x”) will fall in Af, is ^ 1 — v. From this and the continuity 
of the cost ftmction (Condition 2.5) it follows that 

(2.23) lim r(F, iiHi') = r(F, i,?’*) for aU F. 

*■•00 

Clearly, 

(2.24) Sup n) = Sup r(F"***', t,) 

where F"'*’'’ is an element of ft’"’*''. Hence 

(2.25) Inf Sup r(r'‘’', = Inf Sup r(F"’‘'', n"’*). 

Since any F in ft is contained in ft"*’*’' for sufficiently large k, it follows from (2.20) 
and (2.25) that 

(2.26) lim r(F,i,ri‘) g lim {Inf Sup r(F"’*'", n" *)}. 

*■•00 f"*i« 

Hence, because of (2.23), 

(2.27) r(F, uS"‘) ^ lim {Inf Sup r(F"’‘-', i,”’*)}. 

*-00 x«o.o r»,k,f 

Thus, 

(2.28) Inf Sup r(F, i,"-) ^ lim (Inf Sup r(F"’‘'', i,*”)}. 

F *^H0 f«»,« 

Since the left hand member of (2.28) cannot be smaller than the right hand 
member, the equality sigh must hold. This concludes the proof of Lemma 2.3. 

Theorem 2.1 can easily be proved with the help of lemmas 2.1, 2.2 and 2.3. 
From Lemma 2.2 it follows that 

(2.29) lim Sup Inf r Sup Inf r. 

I f*»»« $ f** 



182 


ABRAHAM WALO 


From this and L emma 2.3 we obtain 

(2.30) lim Inf Sup r = Sup Inf r. 

But 


lim Inf Sup r ^ Inf Sup r. 
€-«0 { 

Hence 

(251) Inf Sup r ^ Sup Inf r. 

{ {if"* 

Hence, because of Lemma 1.3, we then must have 

(2.32) Sup Inf r = Inf Sup r. 

It follows from Lemma 2.1 that 

(2.33) lim Sup lair = Sup Inf r. 

m-oo {if"* { ff 

Hence, because of (2.32), we have 

(254) lim Inf Sup r = Sup Inf r. 

w-oo ^»* { If 

But 


(2.35) lim Inf Sup r ^ Inf Sup r. 

m-ioo f*»» I f I 

Hepce 

(2.36) Inf Sup r ^ Sup Inf r 

f I If 

Theorem 2.1 is an immediate consequence of (2.36) and Lemma 1.3. 

2.5. Theorems on complete daeeea of decision functions and minimax solutions. 
For any positive e we shall say that the randomized decision function % is an 
c-Bayes solution relative to the a priori distribution $ if 


(2.37) r({, no) ^ Inf r({, i,) + e. 

f 

If no satisfies (2.37) for c = 0, we shall say that no is a Bayes solution relative 
to{. 

A randomized decision rule ni is said to be uniformly better than no if 

(2.38) r(F, m) ^ r(F, ns) for aU F 
and if 

(2.39) r(F, ni) < r(F, ns) at least for one F. 

A class C of randomized decision functions n is said to be complete if for any 
n not in C we can find an element n* in C such that n* is uniformly better than n> 



STATISTICAL DECISION FUNCTIONS 


183 


Theorem 2 .2. If Conditions 2.1-2.5 are fulfilled, then for any e > OOie dose 
€t of all t-Bayes solutions corresponding to all possible a priori distributions ( 
is o complete doss. 

Proof: Let qo be a randomized decision function that is not an e-Bayes solu¬ 
tion relative to any That-is, 

(2.40) r({, qo) > Inf r({, q) -f e for all 

If r(F, qo) * 00 for all F, then there is evidently an element of C, that is uni¬ 
formly better than qo. Thus, we can restrict ourselves to the case where 

(2.41) r(F, qo) < 00 at least for one F. 

Put 

(2.42) W*(F, d) « W{F, d) - r(F, q») 

and let r* ({, q) denote the risk when WiF, d) is replaced by W*{F, d). Then 

(2.43) r*(f, q) «= r({, q) - r(£, qo). 

Let Q* denote the class of sdl decision functions d(x) for which n{x) ^ m 
identically in x. Furthermore, denote any q for which q(Q“) = 1 by q". We 
shall first prove the following relation. 

(2.44) Sup Inf r*(^, q") = Inf Sup r*(f, q") 

for any positive integral value m. For any positive constsmt c, let Q, denote the 
class of all elements F for which r(F, qo) ^ c. 

Clearly, Conditions 2.1-2.6 remain valid if we replace W(,F, d) by TF*(F, d) 
and H by where c is restricted to values for which Qe is not empty. Hence, 
Theorem 2.1 can be applied and we obtain 

(2.45) Sup Inf r*({*, q") = Inf Sup r*({*, q"), 

where t denotes any ( for which ((fl,) 1. Let h and w be two positive values 

for which 

(2.46) Sup Inf r*($®, q") ^ —h for all c 

|e |,m 

and 

(2.47) r(F, q") ^ w for all F sind all q". 

Clearly, such two constants h and a exist. From (2.46) and Lemma 1.3 we ob¬ 
tain 

(2.48) Inf Sup r* (f, q") ^ —h. 

t 

Since 

(2.49) r*(F, q") < —(h + S) for any F not in ft*+o+»(5 > 0), 



184 


ABRAHAM WALD 


it follows from (2.48) that 

(2.50) Inf Sup r* = Inf Sup r* for all c > h + w. 

t» t« »" t 

From (2.45) and (2.50) we obtain 

(2.51) Sup Inf r* = Inf Sup r* for all c > A + to. 

{• n” { 

Hence, 

(2.51a) Sup Inf r* ^ Inf Sup r*. 

{1?" ti" { 

Because of Lemma 1.3, the equality sign must hold and (2.44) is proved. 
Since ijo is not an element of C, , we must have 

(2.52) Inf r({, ij) < r({, ijo) - «. 

V 

From this it follows that 

(2.53) Inf r*(i, ij) g -6. 

<1 

Hence 

(2.54) Sup Inf r*((, jj) ^ — e. 

{ 1 

It was shown in the proof of Lemma 2.1 that for any p > 0 there exists a 
positive integer , depending only on p, such that 

(2.55) Inf r(f, i;”') ^ Inf r({, rj) + p for all f. 

* n 

From (2.44), (2.54) and (2.55) it follows that there exists a positive integer 
tito , namely wio = m,/*, such that 

(2.56) Inf Sup r*({, i?") < — ^ for any mu. 

From (2.44) and (2.56) it follows that there exists an a priori distribution 
and an e-Bayes solution ijT relative to such that 

(2.57) r*(F, v7) ^ for all F. 

Hence, because of (2.43), 

(2.58) r(F, uD g r(F, uo) - | for aU F. 
and Theorem 2.2 is proved. 

Tbborem 2.3. 7/ D ia compact, and if Conditiona 2.1,2.2,2.4,2.5 are fidfiUed, 
then there exiata a minimax aohUion, i.e., a deeiaion tide in for tohich 



STATISTICAL DECISION FDNCnOMB 


185 


(2.59) 


Sup r(F, ijo) ^ Sup r(F, rf) for all n. 

r r 


To prove the above theorem, we shall first prove the following lemma. 
Lbmua 2.4. If D ia compact and if Conditions 2.1,2.2, 2.4, 2.5 are ftdfiUedr 
then for any sequence {in} (t » 1,2, • * • , od inf.) of randemzed decision functions 
for which r{F, n<) ia a bounded function of F and t, there exists a subsequence 
0 ' = 1> 2,: ■ ’ , ad inf.) and a randomized decision function no such that 


(2.60) lim inf r({, iji.) ^ r({, ijo) for all 

/-o 

Proof: Let * f(ij<) (defined in Section 2.3) be given by {2r.<(*\ • • • i *0} 

and (r = 1, 2, • • • , ad inf.). Thus, 2 r,,(®*, ■ • • , x') is the con¬ 

ditional probability that we shall take an observation on using the rule 
ij< and knowing that the first r observations are given by *‘, • • • , x' and thatex- 
perimentation was not terminated for (x‘, • • • ,a^) (k <r). As stated in section 
2.3, for any r, a:\ • • • , x' the symbol denotes the conditional probability 

distribution of the selected d when nt is used and is known that the first r ob¬ 
servations are equal to a:‘, • • • , x' and that n{x) = r. Since there are only count¬ 
ably many finite samples (x‘, • • • , x'), it is possible to find a subsequence {f,) of 

{i} such that lim Sr.q(x‘, • • • , x') and lim exist.* Put 

(2.61) lim Zr.ifiX^, • • • , x') = Zr.o{x\ • • • , x') 
and 


(2.62) 


lim 5x1... 

/-CO 






As shown in section 2.3, there exists a randomized decision function rjo such 
fo = lino) is given by {ar,o(®*, • • • , x*)} and { 6 ,i...,f,o}. Let gr.<(x\ • • • , x' | () 
denote the probability that the sample (x\ • • • , x') will be obtained and that 
experimentation will be stopped at the r-th observation when i is the a priori 
distribution and m is the decision rule used by the statistician. For any sample 
(x\ • • • , x') let f2i(x‘, • • • , x') denote the expected value of W(F, d) when the 
distribution of F is equal to the a posteriori distribution of F as implied by { 
and (x\ •• • , x') and where d is a chance variable independent of F with the 
probability distribution . Since, r((, m) is bounded by assumption, 

the probability that experimentation will go on indefinitely is equal to zero. 
From this it follows that 

(2.63) 2 Sr,i(x\ • • •, x*" I f) = 1 for all {. 


* The existence of lim follows from the compactness of D (see Theorem 3.6 

/-• 

In PI). 



186 


ABRAHAM WALD 


Then r((, ij<) is given by 

r(lvi) 


c(x\ • • • , D) diji 

I , 

where Qxt....r is the totality of all decision functions d(x) for which n{y) •= r 
whenever = x\ • • • , y' = Clearly, 

(2.65) Im qr,i,{x\ • • • , x'l {) = gr.o(x\ •••,»’' |{). 

Since D is compact and since W(F, d) is a continuous function of d uniformly 
in F (in the sense of the metric defined in D), we have 

(2.66) lim • • • , x') = /Jo(x\ • • • , x’’). 

From Condition 2.5 it follows that 


<2.64) 


= 2 ?r..(x‘, ••• ,x"|f) 

• •»*** 


•• • j X^) + 


/ 


(2.67) 



Lemma 2.4 is an immediate consequence of the equations (2.64) — (2.67). 
We are now in a position to prove Theorem 2.3. Because of Theorem 2.1 
there exists a sequence {i;.} such that 

( 2 . 68 ) lim Sup r(F, i),) = Inf Sup r{F, ti). 

F n F 


According to Lemma 2.4 there exists a subsequence {j — 1 , 2 , • • • , ad inf.) 
and a randomized decision function no such that 


(2.69) lim inf r{F, ij<,) ^ r(F, »jo) for all F. 

/—» 

It follows from (2.68) and (2.69) that i?o is a minimax solution and Theorem 
2.3 is proved. 

Theorem 2.4. 7/ D is compact and if Conditions 2 . 1 , 2 . 2 , 2.4, 2.5 are ftdfiUed, 
then for any { there exists a Bayes solution rekUive to £. 

This theorem is an immediate consequence of Lemma 2.4. 

We shall say that ijo is a Bayes solution in the wide sense, if there exists a 
sequence {f<) (t = 1 , 2 , • • • , ad inf.) such that 

(2.70) lim [r({.-, ijo) - Inf r($,-, ij)] => 0. 

• •-00 1 | 

We shall say that ijo is a Bayes solution in the strict sense, if there exists a { 
such that IJO is a Bayes solution relative to {. 



STATISTICAI. DECISION FUNCTIONS 


187 


Theorem 2.5. If D is compact and Conditions 2.1-2.5 hold, then the doss of alt 
Bayes sohdions in the wide sense is a complete doss. 

Proof: Let rio be a decision rule that is not a Bayes solution in the wide sense. 
Consider the weight function T7*(F, d) — WiF,d) — r{F, ij#). We may assume 
that r(F, fit) < for at least some F, since otherwise there obviously exists a 
Bayes solution in the wide sense that is uniformly better than ijo. Then it 
follows easily from (2.44) and Lemmas 2.1 and 1.3 that 

(2.71) Sup Inf r*((, ij) = Inf Sup r*({, ij) = v* (say), 

I « « t 

where r''‘(f, ij) is the risk corresponding to W*(F, d), i.e., 

(2.72) n) = r(f, i,) - r(f, vo). 

Theorem 2.3 is clearly applicable to the risk function r*({, ij). Then, there 
exists a minimax solution i;i for the problem corresponding to the new weight 
function W*(F, d). Since, because of 2.72, v* ^ 0, we have 

(2.73) r*({, ni) = r(f, rn) - r(f, i^.) ^ 0 for all 

Our theorem is proved, if we can show that iji is a Bayes solution in the wide 
sense. Let {|<} (* = L 2, • • • , ad inf.) be a sequence of a priori distributions 
such that 

(2.74) lim Inf r*(€j, jj) = v*. 

Since vi is a minimax solution, we must have 

(2.75) r*({<,i,i) g »*. 

It follows from (2.74) and (2.75) that m is a Bayes solution in the wide sense 
and our theorem is proved. 

We shall now formulate an additional condition which will permit the deriva¬ 
tion of some stronger theorems. First, we shall give a converg^ce definition 
in the space S. We shall say that F< converges to F in the ordinary sense if 

(2.76) lim Pr(a:‘, • • • ,x'\ F<) = Pr(»*, ,x'\F) (r = 1, 2, • • • , ad inf.). 

Here pr{x^, • • • ,x^ \F) denotes the probability, under F, that the first r observa¬ 
tions will be equal to , x', respectively. We shall say that a subset co 
of Q is compact in the ordinary sense, if w is compact in the sense of the conver¬ 
gence definition (2.76). 

Condition 2.6. The space B is compact in the ordinary sense. If Fi con¬ 
verges to F, as i—* to, in ^ ordinary sense, then 

lim W(Fi, d) = W(F, d) 

uniformly in d. 

Theorem 2.6. If D is compad and if Conditions 2.1, 2.2, 2.4, 2.5, 2.6 hold, 
then: 



188 


ABKAHAM WALD 


(i) there exists a least favorable a priori distribution, i.e., an a priori distribiUion 
.{o for which 


Inf r({o, 1 ?) = Sup Inf r(i, ij). 

« { % 


(ii) A minimax solution exists and any minimax solution is a Bayes solution 
in the strict sense. 

(iii) If fio is a decision nde which is not a Bayes solution in the strict sense and 
for which r{F, ijo) is a hounded function of F, then there exists a decision rule m 
which is a Bayes solution in the strict sense and is uniformly better than ria. 

Proof: Let {{,} (t = 1, 2, • • • , ad inf.) be a sequence of a priori distributions 
such that 


(2.77) lim Inf r((t, ti) = Sup Inf r({, jj). 

<-o» « t « 

Since Q is compact in the ordinary sense, there exists an a priori distribution 
{o aud a subsequence {(,■,} or {(<} such that 

(2.78) limf,-,(w) = {o(w) 


for any subset <o of 12 which is open (in the sense of the ordinary convergence 
•definition in 12) and for which {o(«*) = 0, where w* denotes the set of all boundary 
points of u. We shall show that (o is a least favorable distribution. Assume 
that it is not. Then there exists a decision function 3)o = do(x) such that 


(2.79) r(&, Do) ^ - 5, 

where 5 > 0 and v denotes the common value of Sup Inf r and Inf Sup r. It was 

to 0 t 

8 ho^vn in the proof of Lemma 2.1 that (2.79) implies the existence of a decision 
function !Di = diix) and that of a positive integer m such that 

(2.80) n(x; ®i) ^ m for all x 
and 

(2.81) r(&,SDi) 

Since c(x\ • • • , x”, Di) and W{F, d) are uniformly bounded and W{F, d) is 
continuous in F uniformly in d, we have 


(2.82) limr (F«, Di) = r(F, Di) 

for any sequence {F<} for which Fi —r F in. the ordinary sense. From (2.78), 

(2.82) and the compactness of 12 (in the ordinary sense) it follows that 



STATISTICAL DEaSION FUNCTIONS 


189 


But this is in contradiction to (2.77) and, therefore, ^ must be a least favorable 
distribution. Hence, statement (i) of our theorem is proved. 

Statem^t (ii) is an immediate consequence of Theorems ( 2 . 1 ), (2.3) and state¬ 
ment (i) of Theorem (2.6). 

To prove (iii), replace the weight function W{F, d) by W*{F, d) — 
W(F, d) — riF, ijo) wWe no satisfies the conditions imposed on it in (iii). 

We shall show that (i) remains valid also when WiF, d) is replaced by 
W*(,F, d). This is not clear, since Tr*(F, d) may not be continuous in F. First 
we shall prove that 

(2.84) lim inf r({.-, no) ^ r(^o, no) 

««-«o 

for any sequence {{<} for which + fo in the ordinary sense, i.e., for which 

(2.85) lim ^i(w) = {»'(«) 

for any open subset w (open in the sense of ordinary convergence defined in H) 
whose boundary has probability measure zero according to {o- For any sample 
x^, • • • ,x' let qri(x^, • • • ,x') denote the probability that the first r observations 
w’ill be equal to , x', respectively, when is the a priori distribution. 

Clearly, 

(2.86) qri{x\ • • • , x') = f Prix\ ,x^\F) 

Jo 

Since Pr(x*, • • • , x*" | F) is a continuous function of F, we have 

(2.87) lim q,i(.x\ • • • , x') = qroix^, • • • , xO- 
<—00 


The function r({, rio) can be split into two parts, i.e., r(f, »jo) = ri({, »jo) + rj({, ijo) 
where ri is the expected value of the loss W (F, d) and rj is the expected cost of 
experimentation. Since W(F, d) is a bounded function of F and d, and since 
WiF, d) is continuous in F uniformly in d, we have 

(2.88) lim ri({<, ijo) = ri({o, »/o) 

for any sequence {{<} which satisfies (2.85). To prove (2.84), we merely have 
to show that 

(2.89) lim inf rtHi, vo) ^ r*(fo' rno). 

<—00 

But 

(2.90) rtid , no) = 2 Sri(x\ • • • , xO f c(x\ • • • , x*"; ®) di/o 

where is the totality of all decision functions d(x) with the property 

that diy) » r for any y whose first r coordinates are equal to x*, > • • , x', respec¬ 
tively. Equation (2.89) is an immediate consequence of (2.87) and (2.90). 
Hence, (2.84) is proved. 



190 


ABRAHAM WALD 


Let r*({, ij) be the risk function when W{F, d) is replaced by W*iF, d), i.e., 
v) “ »■({> v) — r({, i/o). Let, furthermore, {{?} be a sequence of a priori 
distributions such that 

(2.91) lim Inf r*(f?, ij) = Sup Inf r*(f, >,). 

» t <1 

There exists a subsequent of the sequence {{?) such that converges (in 
the ordinary sense) to a limit distribution as i —» «. We shall show that 
is a least favorable distribution. For suppose that fo is not a least favorable 
distribution. Then there exists a decision function 35? = «?*(*) such that 

(2.92) r*(to*, 35?) ^ v* - & 

where 5 > 0 and v* = Sup Inf r* = Inf Sup r*. But then there exists a decision 
function 35i = di (x) and a positive integer m such that 


(2.93) 

n(x‘, 3)?) ^ m for all x 

and 


(2.94) 

r*(?«*, 3)?)gt>*-| 


Since r*({, 35*) = f(f, ®i) — r({, tjo), and since 
lim r({?,, 3)?) = r({?, 35?), 

j-oo 


it follows from (2.84) and (2.94) that 

(?.95) lim sup r*({?,, 35?) ^ t>* - 5 

which is in contradiction to (2.91). Hence, the validity of (i) is proved also 
when W{F, d) is replaced by W*{F, d). Clearly, also (ii) remains valid when 
W(,F, d) is replaced by W*(F, d). 

Let 91 be a minimax solution relative to the problem corresponding to 

W*iF, d). Then because of (ii), iji is a Bayes solution in the strict sense. 

Since rio is not a Bayes solution in the strict sense, 1/1 9 ^ tn and v* < 0. Hence 

i;i is uniformly better than «;o. This completes the proof of Theorem 2.6. 

We shall now replace Condition 2.6 by the following weaker one. 

Condition 2.6*. There exists a sequence {Q<} (i = 1, 2, • • • , od inf.) of 

subsets of it such that Condition 2.6 is fxdfUled when Q is replaced by Q,- , D 

and lim Qi = ft. 

<"00 

We shall say that ir converges weakly to 1 ] aai —* 00, if lim f(ijf) = f(ij). 

<"00 

We shall also say that 1; is a weak limit of rn. This limit definition seems to be 
natural, since r({, iji) = r({, ir) if f(iR) « f (171). We diall now prove the follow¬ 
ing theorem: 



STATISTICAL DECISION FUNCTIONS 


191 


Theobem 2.7. If D is compact and if Conditions 2.1,2.2, 2.4,2.5 and 2.6* are 
fvifUled, then: 

(i) A minimax solution exists that is a weak limit of a sequence of Bayes solu¬ 
tions in the strict sense. 

(ii) Let ijo he a decision rule for yjhich r(Ff ijo) is a bounded function of F. Then 
there exists a decision rule m that is a weak limit of a sequence of Bayes solutions 
in the strict sense and such that r(F, rii) g r(F, ijo) for all F in Q. 

Proof: According to theorem 2.6, there exists a decision rule i?< that is a Bayes 
solution in the strict sense and a minimax solution if Q is replaced by . There 
exists a subsequence {»?q} 0’ = 1, 2, • • • , ad inf.) of the sequence {»?<( such that 
{ riij I admits a weak limit. Let ijo be a weak limit of {ij,-, |. Then, as shown in 
the proof of Lemma 2.4, equation (2.60) holds and r/o is a minimax solution rela¬ 
tive to the original space SI. Thus, statement (i) is proved. 

To prove (ii), replace W(F, d) by W*{F, d) = W(F, d) — r(F, no). Accord¬ 
ing to Theorem 2.6 there exists a decision rule nu such that nu is a minimax solu¬ 
tion and a Bayes solution in the strict sense when S2 is replaced by S2< and W(F, d) 
by W*(F, d). Clearly, nu remains to be a Bayes solution in the strict sense also 
relative to SI and W{F, d). Since iji» is a minimax solution relative to S2< and 
W*{F, d), we have 

(2.96) r(f, vu) ^ riF, ijo) for all F in S2<. 

Let {qiq] be a subsequence of the sequence {ini} such that {irii,} admits a weak 
limit ni • Then, (2.60) holds for {iRi^} and ni > and 

(2.97) r(F, i/i) ^ r{F, no) for all F in Q. 

Since ni is a weak limit of strict Bayes solution, statement (ii) is proved. 

3. Statistical decision functions: the case of continuous chance variables. 

3.1. Introductory remarks. In this section we shall be concerned with the 
case where the probability distribution F of X is absolutely continuous, i.e., 
for any element F of 0 and for any positive integer r there exists a joint density 
function Pr(a:'> • • • , 1 F) of the first r chance variables X\ • • • , X". 

The continuous case can immediately be reduced to the discrete case discussed 
in section 2 if the observations are not given exactly but only up to a finite num¬ 
ber of decimal places. More precisely, we mean this: For each i, let the real 
axis R be subdivided into a denumerable number of disjoint sets Ra , Ra, • • • , 
ad inf. Suppose that the observed value x' of X' is not given e.xactly; it is merely 
known which element of the sequence {/2<y} (j = 1, 2, • • • , ad inf.) contains 
x'. This is the situation, for example, if the value of x* is given merely up to a 
finite number, say r, decimal places (r fixed, independent of i). This case can 
be reduced to the previously discussed discrete case, since we can regard the 
sets Rij as our points, i.e., we can replace the chance variable X* by F* where 
F‘ can take only the values Fu, iJa, • • • , ad inf. (F‘ takes the value Ra if X* 
falls in Rii). If Tr(Fi , d) = W{Fo, d) whenever the distribution of F under 



194 


ABRAHAM WALD 


One can easily verify that for any sequence of non-negative functions 
• • • , x', D*)\ (r = 1,2, • • •) satisfying (3.2) and (3.3) there exists exactly 
one sequence {«r(x\ • • • , x')} and one sequence {5,»...,r (D*)} such that (3.1) 
is fulfilled. Thus, a randomized decision rule f can be given by a sequence 
{A,(x’, • • • , x', D *)} satisfying (3.2) and (3.3). The functions »rW, •••,*'■) and 
5,1...^ need be defined only for samples x’, • • • , x*" for which Zi(x\ • • • , x‘) > 0 
for t = 1, • • • , r — 1. The above mentioned imiqueness of z,(xS • • • , x') 
and d,i...tr was meant to hold if the definition of these functions is restricted 
to such samples x*, • • •, x'. 

For any bounded subset S, of the r-dimensional sample space, let 

(3.4) HriSr ,D*) = [ A,(x‘, •.. , x', Z)*) dx’ • • • dx'. 

Ja, 

» 

Let {f4(f == 0, 1, 2, ••• , ad inf.) be a sequence of decision rules, 
and Hri(Sr, D*) be the function H,{Sr, D*) corresponding to . We shall 
say that 

(3.5) lim «= fo 

t—oo 

if 


(3.6) lim HM , D*) = Hr.o{Sr, D*) 

for any r, any bounded set S, and for any D* that is an element of a sequence 
(/;,• = 1, • •' , r,-; j = 1, • • • , 1; i » 1, 2, • • • , ad inf.) of subsets 
of D satisfying the following conditions: 

(3.7) £ Dk^ = D; 2 > 

*1-1 *i 

(3.8) , • • • , are disjoint, 

and 

(3.9) the diameter of Dju—kt converges to zero as 1 » unifonnly in ti , • • • , fcj. 

Lemma 3.1. For any sequence {f<}(» = 1, 2, • • • , inf.) of decision rules 

there exists a subsequence {fj,} (j = 1, 2, • • • , od inf.) and a decision rule fo such 

that lim f= fo. 

/—• 

Proof: Let Hr.iiS ,, D*) (r = 1, 2, • • • , ad inf.) be the sequence of functions 
associated with . Let, furthermore, {!>*,...«,} be a sequence of subsets of D 
satisf 3 dng the relations (3.7), (3.8) and (3.9). Clearly, for any fixed r and any 
fixed element Dkj...ki of the sequence {D*,it is possible to find a subse¬ 
quence {»/} (j = 1, 2, • • • , ad inf.) of the sequence {i) (the subsequence {*,} 
may depend on r and I>k,...k|) and a set fimction H,fi{S,) such that 

(3.10) lim Hr.niSr , !)**...*,) - HrAS,). 

Using the well known diagonal procedure, it is therefore possible to find a fixed 



STATISTICAL DECISION FUNCTIONS 


195 


subsequence {»,} (independent of r and D*) and a sequence of set functions 
, D?,...*,)} such that 

(3.11) . lim = HrASr, 

J-OO 

for all values of r, fci, • • • , fcj and 1. 

To complete the proof of Lemma 3.1, it remains to be shown that 
there exists a decision rule fo such that the associated function Ht{St , D*) is 
equal to H,fi{Sr,D*) for any D* that is an element of }!)*,...*,}. Since 
hr.iix^, • • ■ f x', D*) is uniformly bounded, the set function , £)*,...*,) is 

absolutely continuous. Hence for any values of fci, • • ■ , fcj there exists a func¬ 
tion Ar,o(*\ • • • , a:', such that 

(3.12) f hr.o{x^ ,x', Dt,...k,)dx^ • • • d/ = HrASr , <...*,)• 

JSr 

The existence of a fo with the desired property is proved, if we show that the 
functions hr,o(x^, ••• , x', £)*,... 4 ,) satisfy the relations ( 3 . 2 ) and (3.3). Let 
hrix^, • • • , *”*, D*) — hr{x^, • • • , x', D*) for any m > r. Then, since the func¬ 
tions hr,{ satisfy (3.2), we have 

(3.13) £ HUS„, D*) g V{S„) 

r -1 

where 'F’(Sm) denotes the m-dimensional Lebesgue measure of S„. From (3.13) 
it follows that 

(3.14) t, H,.o(-S„Z)*V -*,) ^ V{S„). 

r -1 

Hence, the functions hr.<iix^, ' ■ * ,x', D*^...k,) must satisfy ( 3 . 2 ) except perhaps 
on a set of Lebesgue measure zero. Since the functions ‘ D*) satisfy 

(3.3), we must have 

(3.15) HrASr, f);....*,.,) = £ HrASr, D*, ..;*,). 

* 1-1 

Hence, the same relation must hold also for Hr,o(5r, D*,...«,). But this implies 
that the functions • • * , i)*,...»,) satisfy (3.3) except perhaps on a set 

of Lebesgue measure zero, and the proof of I.emma 3.1 is completed. 

T.iiiMUA 3 . 2 . Lei Ti{S) (t = 0,1, 2, • ■ *) ^ non-negative, completely additive 
set function defined for all measurable subsets S of the r-dimensional sample space 
Mr. Assume that 

(3.16) Ti{S) ^ V{S) 

for aU S {i 0,1,2, ad inf.) where V(S) denotes the Lebesgue measure of S. 
Let, furthermore, g{x^, • • • ,x')bea non-negative function such that 



,x')dA dx' < 00. 


(3.17) 



196 ABBAHAM WAU> 

If 

(3.18) lim Tt{S) = W) 

i-MO 

then 

(3.19) lim f ff(x* ,---,x')dTi=^f ff(x‘, • • • , /) dTo . 

<-• Jit, Jm, 

Pboof: Let Mr^ be the sphere in Mr with center at the origin and radius c. 
Clearly, 

(3.21) lim f p(x* , ■ • - , x') dx^ ■ ■ • dx'' — f p(x*, • • •, xO dx‘ • • • dx'. 

«-• Jjtr.c JUf 

Hence, because of (3.16), we have 

(3.21) limrf g(x\---,x')dTi- f ff(x*, • • • , x') dr*] = 0 

CMO LJtfr.c J 

uniformly in t. Hence our lemma is proved if we show that 

(3.22) lim f gi(x ‘, • ■ ■ , x') dTi - f g(x ^, • • •, x') dTt 

•-» Jm,,, JUr,, 

for any finite c. Let !/a(x‘, • • • , x') = ff(x\ • • ■ , x') when fli(x* ,• • • , x') ^ A, 
and = 0 otherwise. Since 


lim / ig — g^) dx^ ‘ dx' = 0 

A-n *>!,,, 

it follows from (3.16) that 

(3.23) lim f {g- gA) dTi = 0 

.4--00 JMr,o 

uniformly in i. Hence, our lemma is proved if we can show that 
(354) lim [ gji dTi = f g^ dTo 

imnc Jjlr.e •'if,., 

for any c > 0 and any .4 > 0. Let S,- denote the set of all points in Mj^for 
which 

(3.25) (i - 1) e ^ < J « 

where e is a given positive number. We have 

(3.26) L (j - 1) « f dTi g f gA dTi f dTi, (i = 0,1,2, ■■ ■). 

i iSi JUr,. i JBf 

Since for any «, j can take only a finite number of values, and since c can be 
chosen arbitrarily small, our lemma follows eanly from (3.18) and (3.26). 
laan i fA 3.3. Let {fi} be a sequence of deeieion ndee such that lim f< >■ fp and 



STATISTICAL DECISION FUNCTIONS 


197 


r(F, f<) is a bounded function of F and i (t ^ 1). Then 

(3.27) lim inf r(F, f,) ^ r(F, fo). 

Pboof: First we shall show that it is sufficient to prove Lemma 3.3 for any 
finite space D. For this purpose, assiune that Lemma 3.3 is true for any finite 
decision space, but there exists a non-finite compact decision space D and a 
sequence {fi) such that lim (t = to and 

(3.28) lim inf r(F, f<) »» r(F, to) — S for some F(S > 0). 

iomoO 

Since tt to, there exists a sequence {Di,...*,) of subsets of D satisfying the 
conditions (3.7)-(3.9) and such that 

(3.29) lim Hr,i(Sr, D*^...k,) = H,fi{Sr, 

where HrjiS ,, D*) is the function Hr associated with ti(i — 0,1, 2, ••• ). Let 
X be a fixed value of I and consider the corresponding finite sequence 
of subsets of D. Let k be the number of elements in this finite sequence. We 
select one point from each element of the finite sequence Let the 

points selected be dx, dz, * ■ * ,dk and let D denote the set consisting of the points 
di, • • • , d*. Let be the decision rule defined as follows: the function 
hr(x\ • • • , x', df) associated with f < is equal to Ar,<(x\ • “ , x', D*) where D* is 
equal to the element of the finite sequence which contains the point 

di(j = 1, • • • , fc). Clearly, because of (3.^), 

(3.30) lim - fo. 

Furthermore, for sufficiently large X we obviously have 

(3.31) 1 r(F, ti) — r{F, fi) I ^ « for f = 0,1, 2, • • • , ad inf. 

Since for finite D our lemma is assumed to be true, we have 

(3.32) lim inf r(F, f.) ^ r{F, fo). 

<•-00 


g 

Choosing c ^ x, we obtain a contradiction from (3.28), (3.31) and (3.32). Thus^ 

O 

it is sufficient to prove Lemma 3.3 for finite D, In the remainder of the proof 
we shall assume that D consists of the points di, * ■ ■ , dt. 

The probability that we shall take exactly m observations when ti is used and 
F is true is given by 

prob. {n « m IF, fj} 


f p,»(x‘, • • • , x" 1 F)hn,Ax\ • • • , x", D)dx\ •. • dx" 


(3.33) 



198 


ABRAHAM WALD 


where Afm denotes the m-dimennonal sample space. Since 

it follows from Lemma 3.2 that 

(3.34) lim prob {n = m\F, f<} = prob [n — m,\F, fo|. 
Hence 

(3.35) lim prob {n ^ m \ F, f,} = prob {n ^ m | f**, fo}. 

Since r(jF, f<) is a bounded function of F and i (i ^ 1), we must have 

(3.36) lim prob {n ^ m | F, = 1 (i = 1, 2, • • • ) 


uniformly in F and i. From (3.35) and (3.36) it follows that 

(3.37) lim prob {n ^ m | F, fo} = 1 


uniformly in F. Because of (3.36) and (3.37), we have 

eo 

(3.38) r(F, f<) = 52 r«(F, f<) (i = 0, 1, 2, • • • , ad inf.), 

m—1 

where 

r«(F, ti) = i: f p„{x\ ..., x" 1 F)TF(F, d,) , di) 

J-l JUfn 


(3.39) 


Since 


+ [ • • •. x" 1 F)c(x', • • •, x”) dHUSn ,, D). 


lim , D*) - H„^(S„, D*) 


for any subset D* of D, it follows from Lemma 3.2 that 
(3.40) lim r«(F, f<) - r«(F, fo). 


Lemma 3.3 is an immediate consequence of (3.38) and (3.40). 

3.4. EguaUty of Sup Inf r and loi Sup r, and other theorems. In this section 
we sbitll prove the main theorems for the continuous case, using the lemmas de¬ 
rived in the preceding section. 

Theorem 3.1. If Conditiona 3.1-3.5 are fvlfUled, then 

(3.41) Sup Inf r({, f) = Inf Sup r({, f). 

it ft 

Proof: Let Z" denote the class of all f’s for which prob {n ^ m 1 f, F) — 1 
for all F. We shall denote an element of Z” by f”*. first we shall ^ow that it 



STATISTICAL DECISION FUNCTIONS 


199 


is sufficient if for any finite m we can prove Theorem 3.1 under the restriction 
that f must be an element of Z"*. For this purpose, put Wo = Sup W{F, d) 

and choose a positive integer m, so that 

(3.42) c(»*, • • • , a:") > ^ 

c 

for all m m,. The existence of such a value m, follows from Condition 3.4, 
We shall now show that for any ^ we have 


(3.43) Inf r(f, f”) g Inf r({, f) + e for any w ^ w,. 

I"* f _ 

Let be any decision rule. There are two cases to be considered: (a) prob 

{n ^ m. I fi} ^ ; (b) prob {n ^ m, 1 fi} < . In case (a) we have 

M^o Wo 

r(i, fi) ^ • In this case, let be the rule that we decide for some d without 

taking any observations. Clearly, we shall have r({, fj) ^ Wo and, therefore, 
r({, fj) ^ r(|, fi). In case (b), let be defined as follows: A,(a:\ • • • , x’’, D*) 
for fs is the same as that for when r <m,, and /ir(x‘, • • • , x', do) for f* is equal 

HI#—1 

to 1 — ^ A*(®\ • • * j **) J5) when r =* m,, and zero when r > m, whore do is a 
1 

fixed element of D. Since prob {n ^ m, | {, fi} < , we have 

rr 0 

r{^t it) ^ »•({» ?i) + «• 

In both cases to is an element of Z”*. Hence (3.43) is proved. From (3.43) we 
obtain 

(3.44) Sup Inf r ^ Sup Inf r ^ Sup Inf r + «. 

it if* it 

Assume now that 

(3.45) Sup Inf r = Inf Sup r 

it” t” i 

holds for any m. From (3.44) and (3.45) we obtain 


(3.46) Inf Sup r ^ Sup Inf r. + «. 

!"• I it 

Hence 

(3.47) Inf Sup r ^ Sup Inf r + e. 

ft t r 

Since this is true for any c, we have 


(3.48) Inf Sup r ^ Sup Inf r. 

r I it 

Theorem 3.1 follows from (3.48) and Lemma 1.3. 



200 


ABRAHAM WALD 


To complete the proof of Theorem 3.1, it remains to be shown that (3.46) 
holds for any m. Since D is compact, (3.45) is proved if we can prove it for any 
finite D. In the remainder of the proof we shall, therefore assume that D con¬ 
sists of k points di, • ,dk. Let« be a subset of Q that is conditionally compact 
in the sense of the metric’^ 

(3.49) 5o(Fi , Fj) = Sup I f dFi - f dFt 

Sm I JSn, JSm 

where /S« is a subset of the m-dimensional sample space. We shall show that« 
is conditionally compact also in the sense of the intrinsic metric given by 


(3.50) 

«»(Fi, F*) = Sup 1 r(Fi, f”) - r(F 2 , r") I • 

f" 

Let 


(3.51) 

5»(Fi, Ft) = Sup 1 WiFi . d) - WiFt , d) \ 

d 

and 


(3.52) 

6ziPi } /^2) = ^o{Fi i Fz) + dziFi y Fz). 

It follows from Condition 3.3 and Theorem 3.1 in [3] that 0, and therefore 
also 0 ), is conditionally compact in the sense of the metric Si{Fi , Ft). Hence u 
is conditionally compact in the sense of the metric di{Fi , Fs). The conditional 
compactness of w relative to the metric 5i(Fi, Fj) is proved, if we can show that 
any sequence {Ft] that is a Cauchy sequence relative to the metric dt is a Cauchy 
sequence also relative to the metric 5i. Let jF.) (i = 1, 2, • • * , ad inf.) be a 
Cauchy sequence relative to . Then there exists a distribution Fo (not neces¬ 
sarily an element of ft) and a function W(d) such that 

(3.53) 

lim ir(Fi, d) = W(d) uniformly in d 

and 


(3.54) 

lim f dFi = f dFo 

<-« Ja„ JSm 

uniformly in Sm • 

We have 

riFi,n 

m k M 

= sz/ 

r-l y-1 J Mr 

(3.55) •?(**, 

• • • y \ Fi)W{Fiy dj)hfix^f • • • y X^y (Ij) (IX^ * ' * (IX^ 



STATISTICAL DECISION FUNCTIONS 


201 


where Mr denotes the r-dimensional sample space. The sequence {Fi} is a 
Cauchy sequence relative to the metric if there exists a function r{^”) such that 

<3.56) lim r(Fi, H = r(r) 

uniformly in f". Let f{Fi, f”) be the function we obtain from r(F,-, f") by 
replacing the factor W(Fi, dj) by W{di) under the first integral on the right 
hand side of (3.55). Because of (3.53), we have 

<3.57) lim [r(F.-, f") - f{Fi , f”)] = 0 

uniformly in f". Thus, (3.56) is proved if we can show the existence of a func¬ 
tion fix”) such that 

<3.58) lim f{Fi , r) = Kn 

uniformly in f”. Let C be a class of functions ^(a:\ • • • , *") such that 
I ^(x*, • • • , x") I < < 00 for all ^ in C. 

It then follows from (3.54) that there exists a functional gi<p) such that 

<3.59) lim f ip dFi = g{<(>) 

• JUm 

uniformly in p. Application of this general result yields (3.58) immediately. 
Hence, {F<} is a Cauchy sequence relative to the metric 5i and, therefore w is 
shown to be conditionally compact relative to the metric 5i if it is relative to 
the metric Jo • 

It then follows from Theorem 3.2 in [3] that Sup Inf r = Inf Sup r if we replace 

If™ I" f 

12 by a subset w that is conditionally compact relative to Jo .* Since 12 is separable 
relative to Jo, there exists a sequence {12, } of subsets of 12 such that 12i is condi¬ 
tionally compact relative to Jo, 12i+i 3 12< and ^ 12,• = 12* is dense in 12. Let 

denote an a priori distribution { for which ^(12,) = 1. Since the left and right 
hand members in (3.45) remain unchanged when 12 is replaced by 12*, it follows 
from Theorem 1.3 that equation (3.45) is proved if we can show that 

(3.60) lim Inf Sup r = Inf Sup r. 

immgo f"* f"* ( 

Let If?} (t = 1,2, • • • , ad inf.) be a sequence of decision rules such that 

(3.61) lim [Sup r((', f?) — Inf Sup r] = 0. 

•-00 {»■ f"» 

• Strictly, we would have to write Inf instead of Inf where 17*^ is a probability measure in 

the space of all f”*. But, since the use of any discrete probability measure is equivalent to 
the use of a and since the restriction to discrete tf”* does not change Sup Inf r or Inf Sup r 

{ I 

we can replace Inf by Inf. 

fffi 



202 


ABRAHAM WALB 


According to Lemmas 3.1 and 3.3, there exists a subsequence {iy} of {i}. and 
a decision rule fo such that 

(3.62) lim inf r{Fy f?)) ^ r(F, f?) for all F. 

^■■00 

Since is dense in it follows from (3.61) and (3.62) that 

(3.63) Sup r(F, t7) ^ lim Inf Sup r 

and, therefore, 3.60 holds. Thus, (3.45) is proved and the proof of Theorem 

3.1 is completed. 

Theorem 3.2. // Conditions 3.1-3.5 are fulfilled, then there exists a minimax 
solution, i.e., a decision rule fo for which 

(3.64) Sup r{F, fo) ^ Sup r(F, for all f. 

r r 

Proof: Because of Theorem 3.1 there exists a sequence {fi} (i = 1,2, • • • , ad 
inf.) of decision rules such that 

(3.65) lim Sup r(F, f<) = Inf Sup r(F, f). 

•-00 r { r 

According to Lemmas 3.1 and 3.3 there exists a subsequence {f,-,} of and a 
decision rule such that 

3.66 lim inf r(F, f<^) ^ r(F, To) for all F. 

y-<o 

It follows from (3.65) and (3.66) that To is a minimax solution and Theorem 

3.2 is proved. 

Theorem 3.3. If Conditions 3.1-3.5 are fulfilled, then for any { there exists a 
Bayes solution relative to |. 

This theorem is an immediate consequence of Lemmas 3.1 and 3.3. 

Theorem 3.4. If Conditions 3.1-3.5 are fulfilled, then the class of aU Bayes 
solutions in the wide sense is a complete claos. 

The proof is omitted, since it is entirely analogous to that of Theorem 2.5. 
3.5. Formulation of an additional condition. In this section we shall formulate 
an additional condition which will permit the derivation of some stronger 
theorems. Let the metric 5o(Fi, Fs) be defined by 

«o(Fi,Fj) = Z^Supl f dFi- f dF,\ 

m»l fH Sm JSn Jam 

where Sm may be any subset of the m-dimensional sample space. 

Condition 3.6. The space fl is compact relative to the metric So{Fi , Ft) 

lim W{Fi, d) = Tr(Fo, d ) 

i 

uniformly in d if lim 5o(f^<, Ft) — 0. 

» 

Theorem 3.5. If Conditions 3.1-3.6 hold, then 



STATISTICAL DECISION FUNCTIONS 


203 


(i) Ihere exists a least favorable a priori distribution 

(ii) any minimax solution is a Bayes solution in the strict sense 

(iii) for any decision rule fo which is not a Bayes solution in the strict sense and 
for which r{F, fo) is a bounded function of F there exists a decision rule fi which is a 
Bayes solution in the strict sense and is uniformly better than fo. 

Proof: The proofs of (i) and (ii) are entirely analogous to those of (i) and (ii) 
in Theorem 2.6, and will therefore be omitted here. 

To prove (iii), let fo be a decision rule that is not a Bayes solution in the strict 
sense and for which r(F, fo) is bounded. We replace the weight function W(F, d) 
by W*(,F, d) = W(F, d) — r(F, fo). We shall show that (i) remains valid when 
W(F, d) is replaced by W*{F, d). This is not obvious, since r(F, fo), and there¬ 
fore also W’''(F, d) may not be continuous in F. First we shall prove that 

(3.67) lim inf r({<', fo) ^ r({o', fo) 

for any sequence {$,•} for which 

lim {.-(w) = {o(w) 

{■■00 

for any open subset w of S2 (in the sense of the metric 5o) whose boundary has 
probability measure zero according to |o. Let r«(F, f) denote the conditional 
expected value of the loss W{F, d) plus the cost of experimentation when n = m, 
F is true and the rule f is used by the statistician (see equation (3.39)). Since 
W(F, d) and the cost of experimentation when m observations are taken are 
uniformly bounded, one can easily verify that 

<3.68) lim r«(F<, fo) = r„{Fo , fo) 

flO 

for any sequence for which 

<3.69) lim 5o(F<, Fo) = 0. 

<—00 

Hence, since 0 is compact (Condition 3.6), 

(3.70) lim r«({J, To) = r„(5o', To) 

00 

where 

(3.71) r„({,ro) = f r»(F,ro)df 

JQ 

Since 

00 

»•(€, fo) = 2 »•«({, fo) 

m-1 

inequality (3.67) follows from (3.70). 

The remainder of the proof of (iii) will be omitted here, since it is the same 
as that of (iii) in Theorem 2.6. 



204 


ABBAHAM WALD 


We shall now replace Condition 3.6 by the following weaker one. 

Condition 3.6*. There exists a seguence {fi<} (i = 1, 2, •••, ad inf.) of sub¬ 
sets of 12 such that Condition 3.6 is fulfilled when 12 is replaced by 12i , 12 < 4 .i 1312{ and 
lim 12< <= 12. 

Theorem 3.6. If Conditions 3.1-3.5 and 3.6* are fidfilled then 

(i) A minimax solution fo omd a sequence {f*} {i = 1, 2, • • • , ad inf.) exist 

such that lim f* = fo cind (i = 1, 2, • • • , ad inf.) is a Bayes solution in the strict 

$—00 

sense. 

(ii) For any decision rude fo for which r{F, fo) is bounded there exists another 
decision rude such that is a limit of a sequence of Bayes solutions in the strict 
sense and r(F, fi) ^ r(F, fo) for all F in 12. 

Peoof: According to Theorem 3.5, for each i there exists a decision rule 
(t = 1,2, • • • , ad inf.) such that fi is a minimax solution and a Bayes solution 
in the strict sense when 12 is replaced by 12<. Let {f t be a subsequence of the 
sequence {f*} such that {{•<, } admits a limit fo, i.e., lim = fo. Bei-ause of 

y—GO 

Lemma 3.3, 

(3.72) lim inf r{F, r<,) ^ r(F, fo). 

/-OO 

Hence fo is a minimax solution relative to the original space 12 and statement 

(i) is proved. 

To prove (ii), replace TF(F, d) by W*(F, d) = TF(F, d) — r{F, fo) where fo 
is a decision rule for which r(F, fo) is bounded. In proving statement (iii) of 
Theorem 3.5, we have shown that there exists a decision rule fufi = 1, 2, • • • , 
ad inf.) such that is a minimax solution and a Bayes solution in the strict sense 
when 12 is replaced by 12i and W(F, d) by W*(F, d). Clearly, remains to be a 
Bayes solution in the strict sense also relative to 12 and ir(F, d). Since fu is a 
minimax solution relative to 12< and W*(F, d), we have 

(3.73) r(F, fii) ^ r(F, fo) for all F in 12i. 

Let {fii,} be a convergent subsequence of {fi,} and let lim fi,-, = . Then,, 

because of Lemma 3.3, we have 

r{F, fi) ^ r(F, fo) for all F in 12. 

Since is a limit of a sequence of Bayes solutions in the strict sense, statement 

(ii) is proved. 

Addition at proof reading. After this paper was sent to the printer the author 

found that 12 is always separable (in the sense of the convergence definition in 

Condition 3.5) and, therefore. Condition 3.5 is unnecessary. A proof of the 

separability of 12 will appear in a forthcoming publication of the author. 

The boundedness of r(F, {’<) is not necessary for the validity of Lemma 3.3. 

Let lim fi = f 0 and suppose that for some F, say Ft , r(Fo, f<) is not bounded 
$•-00 

in i. If lim inf r(Fo, f <) = <», Lemma 3.3 obviously holds for F = Ft. If 



STATISTICAL DECISION FUNCTIONS 


205 


r(Fo, f<) = g < «>, let {»/} be a subsequence of {ij such that 

»—00 

lim r(Fo, f</) = g- Since r{F^ , is a bounded function of 7 , Lemma 3.3 is 

/—00 

applicable and we obtain g ^ rCFo , fo)- In a similar way, one can see that also 
Lemma 2.4 remains valid without assuming the boundedness of r(F, ?;»). 

Although not stated explicitly, several fimctions considered in this paper are 
assumed to be measurable with respect to certain additive classes of subsets. 
In the continuous case, for example, the precise measurability assumptions may 
be stated as follows: Let B be the class of all Borel subsets of the infinite di¬ 
mensional sample space M. Let H be the smallest additive class of subsets of 
Q which contains any subset of Q which is open in the sense of at least one of 
the convergence definitions considered in this paper. Let T be the smallest 
additive class of subsets of D which contains all open subsets of D (in the sense 
of the metric d(di , d 2 , ^)). By the symbolic product H X T we mean the 
smallest additive class of subsets of the Cartesian product Q X D which con¬ 
tains the Cartesian product of any member of H by any member of T. The 
symbolic product H X B is similarly defined. It is assumed that: (1) W(F, d) 
is measurable (H X T); (2) pm (xS • • • , x”* [ F) is measurable (B X H); (3) 
is measurable (B) for any member D* of T; (4) Zrix^, • • • , z^) and 
Cr(xS • • • , xO are measurable (B), These assumptions are sufficient to insure 
the measurability (H) of r(F, f) for any f. 

REFERENCES 

[1] J. V. Neumann and Oskab Moroanstein, Theory of Games and Eonomic Behavior^ 
Princeton University Press, 1944. 

l2] A. Wald, “Generalizatoin of a theorem by v. Neumann concerning zero sum two-person 
games,” Annals of Mathematics, Vol. 46 (April, 1945). 

[3] A. Wald, “Foundations of a general theory of sequential decision functions,” Eco- 
nometrica, Vol. 15 (October, 1947). 



THE MULTIPLICATIVE PROCESS 
By Richard Otter’-* 

University of Notre Dame 

1. Introduction and summary. The multiplicative process is usually defined 
by the sequence of random variables Xo, Xi, •• • whose distributions are 
specified as follows: P(Xo = 1) = 1, SJLo PiXi = v) = 1, and if X* = 0 then 
P(Xn+i = 0) = 1, whereas if Xn is a positive integer then X„+i is distributed 
as the sum of Xn independent random variables each with the distribution of Xi. 
The variable X„ is interpreted as the number of “particles” in the nth generation, 
and the index n as a discrete time parameter. This has been the method of 
approach in previous studies of the process [1, 2, 3, 4, 5]. The multiplicative 
process has various applications, notably in the study of population growth, the 
^read of epidemics or rumors, and the nuclear chain reaction. The closely 
related “birth and death” process was recently studied by Kendall [6]. 

Whenever one studies the probability theory of a particular system there 
seem to be definite conceptual advantages in defining explicitly the set of 
elementary events, the additive class 9K of subsets of 3", called events, and the 
probability measure P for the events of iW. Now an elementary event of this 
process can be represented by a rooted tree where the original particle is repre¬ 
sented by the root vertex and where the particles of the nth generation are 
represented by the vertices n segments removed from the root. The tree will be 
finite or infinite according to whether a finite or an infinite number of particles 
are involved in the elementary event. Thus, the set of trees is the natural 
choice for 3". The first part of this paper is devoted to a more precise description 
of 3, 2W and P. We shall then see easily that X„(0, the number of vertices n 
segments removed from the root of ^ c 3, i.e. the number of particles in the nth 
generation, has the distribution defined in the preceding paragraph. Since the 
time does not appear in our description of 3we fetter ourselves somewhat if we 
interpret n as a discrete time parameter. Thus, we have already reaped some 
harvest from considering the process from the point of view of 3. Another 
advantage is that we are led in a natural way to study the distribution of other 
structural features of the trees, e.g. the total number of vertices, or the number of 
vertices with k outgoing segments. 

The chief results of this paper are as follows. The recursion formula for the 
probability P„ that a tree have n vertices n = 1, 2, • • • is obtained as well as an 
asymptotic estimate of P„ valid for large n. The distributions of the number of 
branches at the root in a finite tree, an infinite tree, or in a tree with n vertices 
are obtained and the asymptotic distribution of the latter as n —> «. The 

‘ Research under an OfBce of Naval Research contract. 

* The author wishes to express his gratitude to Professor E. Artin of Princeton University 
for the suggestion of this problem and his encouragement towards its solution. 

206 



THE MULTIPLICATIVE PROCESS 


207 


distribution of the fraction of vertices with k outgoing segments in the finite 
trees, in the trees with n vertices, and the asymptotic distribution of the latter 
as n —>• “ are also found. Finally, an estimate is obtained for the probability 
that a tree be finite in case this probability is near 1, a result which was previously 
obtained by Kolmogoroff [7]. 

2. The space of trees. We shall use the notation {aj, {ai, o^, On), 
{a,}, and {o, i R}j,j to denote the sets which consist of respectively the single 
element a, the elements Oi, Oj, • • • a„ , all a, with j t J, and all a,- with the 
property R and j tJ. We denote the union of two sets A and Bhy A + B, their 
intersection by AB, and the cartesian product of n identical factors each of 
which is A by A^”\ 

Ijct I denote the set of positive integers. We assume given for each n el a. 
countable set f7„ of objects called vertices, i.e. 

Let ue be a vertex distinct from all the other vertices and let L’’ = {«o} +2 Un 
be the collection of all the vertices. We shall interpret uo as the original parent 
particle and the vertex Uis:, for example, as the second son of the fifth son of the 
first son of the original particle. If a is a subset of I/’, s C U, and if , 4 > * * * in+m 

are such that .. , , • • • each belong to « then 

this set of vertices is called a path from to tt.,in s and m > 0 is the 

length of the path or the distance from to . If m = 1 we call 

the path a segment, for short. 

For the sake of convenience let us agree to put ,(»>!) 

then we define W(s, m), for « «a C 17, to be the number of segments from m in a, 
and we call TF(a, u) the type of the vertex m in a. If tis a subset of U, then we 
call t a tree if and only if 

(1) W(t,v) < <» for net 
and 

(2) Uiih -in «i implies et for v = 0, 1, • • • u . 

Let 3" be the set of all trees. The condition (2) clearly implies that for each 
<«3" we have Wo «t and that there is a unique path from ue to any other vertex 
of t. Hence, whenever a path exists between any two vertices of t it isunique. 
We call «o the root of t. If for uetefT we have W(t, u) = 0 then u is called an 
endpoint of t, and the vertices of t which are not endpoints are called inner 
vertices. (It is to be noted that the objects we call trees here are rooted trees 
in the sense of Cayley but our trees have their vertices numbered as well. 
Usually one would identify the trees { 710 , wi, «*, mu} and {t(o, mi , ih, Ujil, 
but we do not wish to do so because for us it is distinctly different whether the 
grandson is sired by the first son or by the second son.) 

For « 61 e 3’we define the branch of f at u to be the set of all vertices belonging 



208 


BICHARO OTTBB 


to any path from u in Our conv^tion of admitting paths of length 0 implies 
that u < bit, u). In fact, if Tr(<, u) =< 0 then bit, u) = {u}. E t' is a tree such 
that t' C t then we call t an extension of t\ denoted t>VatV<t,yiW iV, «) > 0 
implies Wit', u) = Wit, u). Thus < > is equivalent to O and 

t t' 

where u runs through all the endpoints of t'. The extension relation imposes a 
partial ordering upon T. 

The extension t of t' is interpreted as a possible future aspect of a family tree 
when its structure at present is given by t', all present members of the family 
who have progeny being regarded as sterile. 

E M then the mapping <p defined for the vertices of bit, u) by 

putting 

maps bit, u) one to one onto a tree <pibit, u)) in such a fashion that if {ai, d*} is a 
segment from vi to Vi in bit, u) then {v5(t'i), ^(i’ 2 )l is a segment from (pivi) to 
v>(Ps) in (flibit, u)). We call the mapping <p a homeomorphism and we say that 
bit, u) is homeomorphic to <fiibit, u)). 

E a tree contains a finite number of vertices then it is called a finite tree; 
otherwise it is an infinite tree. Let CFdenote the set of all finite trees and B the 
set of all infinite trees, and let Df denote the set of non-negative integers. For 
each fc e 3if we define Ykit) for f«7to be the number of vertices of tsrpe k in t. 
When it is clear to which tree t we refer we shall usually abbreviate Fo(<) by m, 
and we agree not to use the letter m with any other connotation. For each 
r * JF let ei(2’), e^iT), • • • e„iT) denote its m endpoints. We then define for 
r eJFand k = ih , h , ■ • ■ k„) 1 

[T, k]= {t\t>T, Wit, eiiT)) = ki,i = l,2,--- m'eJ}, 

and we call \T, k] a neighborhood. For each t e [T, k] we say [T, k] is a neighborhood 
of t. Then it is easy to show that fT is a topological space where the neighborhoods 
defined above form the defining system of neighborhoods [8]. 

3. The measure theory in T. In the following paragraphs an outline of the 
measure theory in 0“ is given which omits proofs for the most part since they are 
easily constructed. The only point of difficulty arises in showing the measure 
function to be completely additive, but here the outline has more detail. 

Let © be the collection of subsets of 3" such that 0 « © and any other set S 
belongs to © if and only if there is a t (7 and a non-void “rectangle set” 
A =s X As X • • • Am C m = Fo(0, such that 

(3) 5 = 53 [t, k] 

where the sets Ai, As, • • • A* may be finite or infinite sets of non-negative 



THE MUI/nPUCATIVE FB0CE8S 


209 


integers. The collection of neighborhoods which appear as terms in (3), i.e. 

, we call an ^partition of S, and t is called the generator of the @- 
partition. Only a finite number of ©-partitions are possible for an S e ©, 
because only a finite number of trees can possibly be generators and there is 
only one ©-partition per generator. With respect to our partial ordering of the 
trees all possible generators lie between two particular ones. We call the 
smaller of these the irreducible generator and the corresponding ©-partition the 
irreducible ^-partition of S. Any partition of S into neighborhoods must be a 
subpartition of this irreducible ©-partition. The elements of © also display 
two important properties of the rectangles in Euclidean space, namely if S 
/S' e © then 

(4) SS' € © 

and if /S C /S' then there is a finite chain 


(5) /S = /So C /Si C . •. /S„ = /S' 
such that Si, Si — /S,_i « © for f = 1, 2, • • • n. 

A class of sets with the properties (4) and (5) has been called a half-ring by von 
Neumann [9], 

Let po, Pi, • • • be given non-negative numbers such that 2"p» = 1. For 
< « Tlet us put 

(6) p«) = ft pr'“' 

r-l 

with the convention 0“ = 1. We then define the measure function P for the 
sets in © by 


(7) 


P(0) = 0 

F([<, k]) = (ft P.,) P(0, where k = (fci, • • • fc«) « 

P{S) — 23 F*([^ *])/ where {[<, /t]}„.i is the irreducible ©-partition of S. 

k«A 


P is evidently non-negative. Letting t be the tree with one vertex and putting 
A = Xgives PifT) = 1. It is easy to see that P is completely additive for the 
©-partitions of a neighborhood, but this implies P is completely additive for the 
©-partitions of an arbitrary element S of ©. In order to show that P is com¬ 
pletely additive for any partition of S into elements of ©, it is necessary and 
sufficient to show this for an arbitrary partition of a neighborhood into neighbor¬ 
hoods. Oije may reach finer and finer partitions of a given neighborhood N by 
replacing a neighborhood in any one partition by an ©-partition of the neighbor¬ 
hood, and repeating the process. Hie sum of the measures of the sets in the 
partition is invariant under such a replacement. On the other hand it can be 
shown that all possible partitions of N into neighborhoods may be reached in 
this way. More precisely, let ^ = {iV,} ,v be a partition of a neighborhood N 



210 


RICHARD OTTER 


into neighborhoods Nj . We call reduced if whenever a subset of N is an 
©-partition of a neighborhood M CZN then the partition consists of M itself, i.e. 
it is the irreducible ©-partition of M, Then we have the following theorem: 

Theorem 1. If If is a reduced partition of a neighborhood N into neighborhoods 
then if = {iNT}. 

The proof is indirect and proceeds by constructing a decreasing sequence of 
neighborhoods contained in N whose limit is not void and yet has nothing in 
common with any Nj , but this is a contradiction. 

Let 5 consist of all those sets which may be formed by finite unions of disjoint 
elements of a half-ring ©, then 5 is a field of sets. If P is a completely additive 
measure on © then its natural extension Pi is completely additive on 5 [9]* 
Kolmogoroff [10] has shown that the completely additive measure Pi may 
always be extended to a completely additive measure P 2 on the Borel field 9W, 
i.e. the smallest additive class of sets containing g. Since = 1, P 2 is a 

probability measure. For simplicity we put P 2 = P. Let us also agree that if 
M is the set of all trees with the property R we may write P{R) instead of P(Jlf). 
If i\r is a set with P{N) > 0 then P(M/N) shall denote the conditional probability 
of M, given N, i.e. P(M/N) = (P(N)r^P(MN). 

4. Independence of the branches. In the multiplicative process the events 
occurring in one branch of a tree are independent of those in a second branch 
disjoint with the first and it is for this reason that the process is relatively simple 
to analyze. In this section we shall try to expose the character of this 
independence. 

For r € IF, let be the set of all extensions of T, then 

= E [T, k], 

<T/ (w) 

whence by (6) and (7) P(&r) — PiT). The following lemma is then easily 
established. 

Lemma. 1. If P(St) > 0 then Wit, e,(^)). i — L 2, • • • m, under the condition 
t (St, are independent-random variables each with the distribution, 

(8) PiWit, eiiT)) = kf&r) = p* A: = 0, 1, 2, • • • . 

In the particular case where T = {t<o} we have Sr = JT and we put 
Wit) = Wit, «o) for short. Thus Wit) tells what type of vertex the root of t is 
and (8) becomes 

P(Fr = A:) = p» fc - 0, 1,2, ••• , 

For t c JT and n = 0, 1, 2, • • • let X«(0 be the number of vertices of 
t at distance n from its root. Then Xo(0 = 1 and Xi(0 = TF(0. If n, r are posi¬ 
tive integers then there is at least one T c fF which has r of its endpoints, say 
ei,iT), etjiT), • • • ei,iT), at distance n from the root and which also satisfies 
X„+xiT) - 0. Put 

a {«I Wit, CjiT)) - 0, i 7^ ti, ^ • t,, A « §r}. 



THE MULTIPIilCATIVE PROCESS 


211 


Evidently for t c Si*’”*' 

- i: W{t, e,,(T)), 

and a proof similar to that of lempia 1 gives 
Lemma 2. If > 0 then Xn + i ( i)y under the condition 

is the sum of r independent random variables each with the distribution of Xi . 

By (6) and (7) for ^ € 0R C 

P(0 = 

,-0 

which depends only upon the type of each vertex as it occurs in t. For those 
vertices which are inner vertices of T, Y,(t) is constant. Any other vertex 
belongs to one and only one of b(t, et{T)), bit, e%iT)), • • • bit, CmiT)) and its 
type in t is, of course, the same as its type in the branch to which it belongs. 
Furthermore, each branch is homeomorphic to just one tree in 7, 

bit, CiiT)) *^ti, i = 1, 2, • • • OT. 

Since the type of a vertex is preserved under homeomorphism we have 

Pit) = Pi$r)Pih)Pik) ••• 

K, as f runs through 911, (<i, 4 , • • • t„) runs through 9Ri X 911* X • • • 9ll «, 
we obtain 


(10) P(9R) - P(Sr)P(9Ri)F(9R2) ••• P(9ll«). 

Let us hereafter put p = Pi'S"), In the particular case of (10) where 9R «» USt 
we clearly have 9R< =* JF, t = 1, 2, • • • m, hence 

(11) P(35r) = P(6r)-p”. 


If we define T ,, v 
WiT,) = V then 


( 12 ) 


0, 1, 2, • • • , to be the tree with v + 1 vertices which has 


{«o} + 2^,» 

CF = {uo} + 52 IJSr,, 


where 


Srf&Tf “ {^} ~ 

PiSr.) = p,, 
From (11) and (12) we get 


i 9^ r, 

v = 1, 2, • • • . 


52 p»p' = p. 

pwmO 


(13) 



212 


RICHABD OTTER 


For t e let Z(b(t, ei(T))) be the number of vertices in the branch of t at 
ei(T). In the particular case where T = {t<o} we have b(t, «o) =• < and Z(t) 
is the number of vertices of t. If now 

y, = {( I z(0 = n, < « 5}; n = 1, 2, • • ' ; 

Pn = Pi^n), 

then by putting 9R = flR?*" ’'” where 


(14) 


I Z(6(<, e.CD)) = m , 
9ll.- = 5^<, i = 1, 2, • • • m, 


i = 1, 2, • • • m, < «5§r}, 


w'e may apply (10), which gives 


(15) P(t 1 5Fgr, Z(b(t, eXm = n.-, t = 1, 2, . •. m) = P(gr)Pn.Pn 


If p > 0 we may multiply and divide the right hand member of (15) by p” 
which leads us to the following lemma: 

Lemma 3. If P(JF§r) > 0, then Z(b(t, c<(r))), t =• 1, 2, • • • m, under the 
condition 1 1 7Sr, are m independent random variables each with the distribution 
of Z(t), given tt7. 


6. The distribution of Z{t). Let/(w) be the generating function for the dis¬ 
tribution of TT, i.e. 

(16) f(p}) = Y^p.w’ 

»-o 

where iw is a complex variable. If one is interested in studying the sequence 

Xo, Xi, • ■ ■ then one should define another sequence of functions fo, fi, • • • 

where/o(«’) = w aad/„+i(w) - /(/„(«>)) forn = 0, 1, 2, • • • . By computing 

formally the expansion of /n(w) around w = 0 it is not difficult to show that 

00 

/n(w) is the generating function for X„ , i.e. fn(w) = 2 P(X„ = v)w’ which is 

P—0 

the starting point for the previous investigations of the multiplicative process. 
But since we shall be mainly interested in the distribution of Z we define 
to be the corresponding generating function, i.e. 

(17) f?(2) = ZPn?". 

n—1 

Let p and a be the radii of convergence of the power series in the right members 
of (16) and (17) respectively. Since /(I) =■ 1 and £P(1) =» P(5F) < 1 we know 
p, a > 1 hence/(tp) and fflz) are analytic in 1 u> | < p and | z | < a respectively. 
The relation between the distribution of W and that of Z is put in evidence by 
the following theorem: 

Theorem 2. Let 


@(z, w) » zf{w) — w, 



THE MULTIPLICATIVE PROCESS 


213 


then w » 9^(z) is the unique analytic solution of 
(18) @(a, w) = 0 


in a certain neighborhood of (0, 0). 

Proof. Since 9’(e) is analytic at 0 and £P(0) = 0 it suffices to show that if we 
substitute formally PnZ” for w in p,w' the coeflScient of 2 “ is uniquely 
determined and is P„ . 


(19) 


2 S Pv(9’(3))' = PoZ + 

pmmQ 




z 


n 


If in (14) we put T = T, , where T, was defined just before (12), then 
m = Yo(T,) = V. Let us require in addition that the total number of vertices 
in the branches be ra — 1, i.e. ni + tij + • • • n, = n — 1, then 


( 20 ) 


% 


z 


F-1 


2n**-n —1 


n 


2 , 3 , 


> 


where 




unless i — j and ni = mi , n* = mj, • • • n,- = m,-. By applying P to (20) and 
using (15) we get the coefficient of Z" in (19) for n >2. This together with the 
obvious fact that Pi = po completes the proof. 

It is worthwhile noticing that by means of the formula of Burman and La¬ 
grange [11] we can solve the recursion formula for P„ in terms of po, Pi, • • • , 
namely 


( 21 ) 



z 

Sjry—n—1 


(n - 1)! 

>^01 J'l! • • • 


Po Pi ■ 


Now if t has n vertices we know from Euler’s characteristic that 
Zi^>(0 = » — 1. Since P(t) — JJ we see from (21) that 

is the number of trees in (Fn for which Fo(<) = >' 0 , Fi(0 = I'l • • • • . 

Evidently to = SP( 2 ) remains a solution of (18) for all z such that | 2 | < a, 

1 w 1 < p. In case ?% = 0 the constant 0 solves (18). Hence £P( 2 ) = 0 for all 

2 and so £P(1) «= p = 0. Conversely, if p = 0 then Pi = po = 0 which gives 
Corollary 1. p = 0 if and only if po = 0. 

Since we wish to investigate the distribution in 7 we shall henceforth assume 
Po 0. 

Any non-constant function g(z) which has a power series development pos¬ 
sessing non-negative coefficients giz) =» E ^ 0 with a positive radius of 

convergence R has two properties that are important for us: 


(22) g(z) has a singularity at R. 



214 


BICHARD OTTER 


(23) K 2 converges then 2 converges absolutely and uniformly 

for I Zo 1 = and so the series defines a continuous function g(z) there. We 
have J™, g(z) = ^a,zo as long as the path of approach to zo lies in | 2 ] < 

R. On the other hand, if as 2 approaches 72 through real values below 72» 
z —* R—, the limit of g(z) exists then ^ 0,72' converges. So if we put ^(72) = 

fif( 2 ) = 2 a,R’ then the meaning is unique even allowing « as a value. 
Returning to 9’(2), if for | 2 | < a we have | «> | < p where w = 9’(2), then 

(24) = 

which shows the mapping is schlicht in such a domain and that the image domain 
cannot contain zeros of fiw). Because of (23) and the fact that ^P(1) is finite 
even if a = 1 we see that the mapping is certainly one to one for 1 2 1 < 1. 
Corollary 2. p is the smallest root of f(w) = w in 0 < w < 1 . 

Proof. (13) shows p is a root in the interval. K for 0 < iPo < p we have 
/(tOo) = Wo then by (24) fP“^(wo) = 1. 

The following corollary is the well known criterion for extinction 
Corollary 3. p = 1 if and only if /'(I) < 1. 

Proof, p = 1, po > 0, and the convexity of f(w) in 0 < w < 1 guarantee 
that (J(w) — l)/(w — l)is bounded by 1 and is monotonic increasing with w. 
Hence/'(I) exists and is < 1. 

Conversely, if /'(I) < 1 then either/'(w) is constant (=pi<l)inO<w<l 
or else it is strictly increasing with w and in either case f'(w) < 1. The 
mean value theorem gives/(w) > la in 0 < w < 1, hence p = 1. 

Putting a — 9*(a) we have the following lemma: 

Lemua 4. a < p. 

Proof. We already know that 9{z) has a unique analytic inverse given by 

(24) for I 9{z) I < p, but on the other hand 9*'( 2 ) 3 >^ 0 for 0 < 2 < a so this 
inverse is analytic for 0 < w < o. If we had o > p we could continue /(w) 
analytically by means of (24) along the real axis past its singularity at p, but 
this is impossible. 

Corollary, p = \ if and only if a > 1. 

Proof. The necessity follows from the monotone behavior of £P(z) for 
0 < 2 < a. Conversely, if o > 1 then 2 = 9’“*(1) = 1. 

Theorem 3. // po + Pi 5 ^ 1, then 

(25) a and a are finite; 

(26) /(o) = a/a; 

(27) /'(o) < 1/a where the strict inequality can hold only if a — p. 

Proof. Let r > 2 be such that p, 5^ 0, then for 0 < 2 < a, we get from the 



THE MULTIPLICATIVE PBOCESS 


215 


functional equation 

zpri9*(z)y — ^P(z) < 0 ; 

»<■'«<(?,) • 

By letting z —»a— we see a is finite and fP(z) is bounded. Since fP(z) is mono¬ 
tonic in this region we get o < <». By letting z —*a in §(z, fP(z)) we get (26). 
For 0 < z < a, @b(z, ^P(z)) = zf'(iP(z)) — 1 is continuous and monotonic in¬ 
creasing with z and is < 0 for z near 0. From the general theorem on implicit 
functions we know @„(z, £?(z)) 0 for | z | < a, so if we let z —> a we obtain (27). 

If c = p (27) merely guarantees the finiteness of /'(p) and gives an upper bound. 
One can easily construct an example where 1/a is the least upper bound and 
one where it is not. 

But if o < p then since @(z, w) is analytic at (a, o) and @(a, a) — 0 we obtain 
from the implicit function theorem the strict equality in (27). 

CkiBOLLAET. If a = I then o = p = 1. 

Pboof. By (26) 

(28) /(o) = o - y(l) = p < 1. 

If a < p then /'(p) = 1 so p = 1 from the convexity of f(w). If o = p then 
o > 1 which when combined with (28) gives a = 1. 

The case where po + Pi = 1 escapes Theorem 3 but it is easily examined 
separately, namely 

f(w) = Pd + PlW, Po 5^ 0, 

ip.prz’^ = 

n-1 1 — PlZ 

Hence p = 1, a = 1/pi and a = p = «. 

For the practical applications of the theory it is valuable to know some 
conditions which guarantee o < p, and thus strict equality in (27). From the 
foregoing analysis it is evident that one such condition is p = «, i.e. /(to) is an 
entire function, and another is/'(I) > 1. If one has enough information about 
/(to) to plot its graph for real positive to then the line through the origin tangent 
to /(to) in the first quadrant touches the curve at the point (o, a/a) from which 
we determine both a and a. 

6. Asymptotic properties of the distributions. If we examine the terms of 
the sequence po, Pi , • • • we may find that the indices of the non-zero terms are 
all multiples of some common integer larger than 1. In this case we should 
expect to have P» = 0 with the same sort of regularity. So let us define q to 
be the largest integer such that p, 9 ^ 0 implies v is a multiple of q. Clearly we 
have 9 > 1 and 9^1 means there is no integer other than 1 which divides the 
indices of all the non-zero p,. Of course, pi 0 implies 9=1. The following 
theorem establishes an asymptotic estimate for P„ valid for large n, provided 



216 


BICHABD OTTEB 


n — 1 is a multiple of q, and incidentally shows that P« = 0, if n — 1 is not a 
multiple of q. 

Theorem 4. If a < p then 


(29) 





[0 


i.e. for large n = \ (mod q) 



n 3 1 (mod g); 
n ^ 1 (mod g), 


Proof. Let us put 0 = 2ir/g, then for | id [ < o, 


I/(to) I = 


eo 


2^ PkqW 
A-O 


kq 


< 2 I 1 ** = /(I W I), 

Jfe-t 


and the equality evidently holds if and only if arg to is an integral multiple of B. 
Furthermore, if to is such that |/(to) | = /([ to [) and we put « = £P^‘(to) then 
to = iP(w/f(w)) so we get 

hence P„ = 0, if n 1 (mod g). 

For I 2 I == a and to = ff{z) the point ( 2 , to) satisfies (18) by (23). If we put 

Zp ^ OLQ f 

to, = ae'"^, V = 0 , 1 , - q — 1, 

then to, = 9’(2,) and 


@u,( 2 ,, to,) = z,f'(w,) — 1 = af'{a) — 1=0, 

so that zo, 2 i, • • • Zq-i are certainly singularities of £P( 2 ). But /(to) is analytic 
at to, and /(to,) = a/a 9 ^ 0, so the solution of (18) for 2 , 


2 


= £r‘(to) = 


to 

/(«>), 


is analytic at to,. Furthermore 




dto* 


9^‘(to,) 


1 - 2,/'(t0,) _ ^ 

f(w,) 

_ z,f''(w,) _ ' a f"[fi) 
/(to,) to. 


w’hich shows that 9 {z) has a branch point of order 1 at each z,, i.e. £P( 2 ) is an 
analytic function of (z — z,)*'* in the neighborhood of (z,, to,), v = 0,1, • • • g — 1. 
For I z I = a, to = £P(z) but 2 5 ^ z, we obtain 


1 @„(z, to) I > 1 - o( I /'(to) I > 1 - a/'(| to 1) > 1 - a/'(o) » 0, 



THB M0UnFLICA.TIVB PBOCBSS 


217 


hence d’iz) is an analytic function of s in a certain neighborhood of such a 
pair (2, w). 

By analytic continuation we find a circle of radius /3 > « such that ^(2) is an 
analytic function of (2 — 2,)*^* for | 2 | < jS. If we make radial cuts in this 
circle running outward from each z, then in the resulting domain D each of the 
fimctions (2 — 2,)‘^* is an analytic function of 2 hence so is 9 *{z). 

Let r be the path consisting of the boimdaiy of D oriented in the posi¬ 
tive sense, let 7 be the part of F lying in the sector —v/q < arg 2 < ir/q, 
and let y' be that part of 7 leading from /3 to a along the lower lip of the cut at 
a, thence along the upper lip back to /3. Since £P(2) satisfies the relation 
£r(e*'*2) = e”' 9 ‘(z) for v = 0,1, • • • g — 1, we see from Cauchy’s formula that 


where 


P„ = — /* — dz = — f ^ dz, 
" 2iri Jr 2 »+‘ 2vi Jy 2 »+» ’ 


A = = 0, 

»-0 

= Qy 


n ^ 1 (mod q) *, 
n = 1 (mod q). 


Restricting ourselves to n = 1 (mod q) we put 

^P(^) ^ a + b{z- + c(z - a) + (2 - cif" 2 {z)y 

where Sl(z) is analytic in D. Then Pn — B + C, where 

D q f a + b{z — a)^ + c{z ~ a) , 

-*• 

2inJy 2 "+* 


We find 


^ “ 2^ /r (‘f) + 0W-)i 

I ° ® (i /,. <*> I) = 0 (“" I C»^) 1) • 

The constant b is determined from the equations 

— a = b(z — + • • • ; 


Using the fact that 

icai- 

IC?)!- 

we finally obtain (29) as desired. 







218 


RICHARD OTTER 


Thus Pn approaches zero a little faster than exponentially with n regardless 
of whether p = 1 or p < 1, except for the special case when a = 1. In this 
case it is interesting that, according to the corollary to lemma 4, p = 1. 

The case where q 9^ 1 is of no practical importance since one can always bring 
q back to 1 by making a very small decrease in one of the non-zero and in¬ 
creasing Pi by the same amount. This can clearly be done so that none of the 
important characteristics of f(w) is changed appreciably. 


7. The limiting distributions of W{t) and n ^Yk(t) for i € Jn. Let us mo¬ 
mentarily drop the condition po 0. The characteristic function of W is 

(30) = /(e"), 

so that for the rth moment of W we have 


(31) 


//<’■> 


r = 0, 1, 2, 


For the first and second moments we obtain 


E(W) =/'(!), 

W^) =/'(!)+/"(!), 

which shows that the criterion for extinction (Corollary 3 to Theorem 2 ) may be 
stated as follows: the multiplicative process is almost certain to expire if and 
only if E(W) < 1. From (30) we see that all the moments of W will be finite 
as soon as p > 1 ; but if p = 1 no general statement can be made, except in case 
0 = 1 also, for indeed o 1 implies a = 1 so by (31) and (27) F(TF) = /'(I) < 1 . 

We now reassume po 5 ^ 0. Since the variables Z,Yo,Yi,--- are restricted to 
t« SFit is convenient to see what happens to W in 7. If we define g{w) = p~^f(!pw) 
then (13) shows g(w) and ff(e’*) are the generating function and characteristic 
function respectively for W, given t e 5". Thus we see immediately that the 
first moment of W, given (F, is always < 1 , and all its moments are finite if p < 1 . 
In case 0 < p < 1 we may also introduce h(w) defined by 

(32) f(w) = pg(w) + (1 - p)h(w), 

then h(w) is obviously the generating function of W, given Here the rth 
moment is finite whenever the rth moment of W is finite. (32) gives 

P(W ^ k/^) = Pk\^, fc=l, 2 , •••. 

1 - p 

It would be interesting to be able to compare this with the corresponding thing 
for large finite trees and in this connection we have the following theorem: 
Theokem 5. If a < p and g =» 1, 

lim P{W = A:/y„) = aifcpiO*"*, = 1 , 2 , • • •. 



THB MULTIFLICATTVE PROCESS 


219 


Proof. By expanding 2/(e*9*(z)) in powers of z we obtain 


(33) 




n«-l 


where 








dP = Z e* 

F-»l 2n/—n—1 


P,Pf,,Pnj" 


. P 

»» » 


so that if P» 0 then P^4>n{6) is the characteristic function of W, given !F,. 
From (33) we get 



/(e“g*( 2 )) 


dz. 


Since o < p we may expand f(e^3*(z)) about the point £P(z) = a and integrate 
as in the proof of theorem 4, thus 


PZ'Ue) = ^'c"/'(a«") + «»(<»). 

d n 

Since «„(tf) —»0 as n —» <», 

lim P;V«(e) = ae^f(ae% 


the limit function obviously being the characteristic fimction for the distribution 
whose generating function is awf'iflw), from which the theorem follows directly. 
Now 9{z)lp is the generating function for Z, given 7, and the function solves 


(34) 

for I 2 I < a. 


zg{w) — w = 0 
We find for the rth moment of Z 




£r(e") 


diioy p 


hence all the moments are finite as soon as a > 1. 

dw g{w) _ w 
dz ** 1 — zg'{v)) ~ z(l — zg'{w)y 

we obtain for the first moment 


= 0 , 1 , •••, 
Since by (34) 



E(Z/T) = 


TO 

V 


1 

1 - a'(i) 


1 

!-/'(?)• 


In a similar way one can express any moment of Z, provided it is finite, in terms 
of /'(p), /"(p), etc. If a = 1 we see from the corollary to theorem 3 that even 
the first moment of Z is infinite, except for the fecial case where p — I and 

/'(I) < 1 . 

The characteristic function of F*, given W, is 


M(f) = I/P [ dP » 1/p £ ^*.(6), 

Jf n-l 



220 


BICHABD OTTEB 


where by (21) 


= f 

J t 


dP = Z 


£ e-’-p.-p.- 


Thus, if P„ 7 ^ 0 , is the characteristic function of K», given ff# . If 

pt = 0 then = 1. If p* 7«^ 0 put p* = e®* then 


hence 


Cf' ' I I —_ 

d'iq) = Z ( 0 )/, 

v 9 ib n -1 


i^;g>a) - «r;/y), 


which shows that all moments of Yk are finite if a > 1. Let us put w =■ 3‘(e), 
for short, then, by (18), 




' ' «5, l-l/'M 

which gives for the first moment of Yk , 

= vkv^^mrs), 

which is to be expected since ptp*"’ plays the same role in (F that p* plays in 3". 
We may also expect that for vT^Yk should be closely related to p* . This 

question is settled by the following theorem: 

Theorem 6. If a < p and q = I then for x real 


lim P(n“‘r* < x/%) = 


1 , ifx'tapka*’^; 

0, if X < 


Proof. We intend to estimate the rth moment of n ‘F* for t and n 
very large from (35) by means of the contour integral 




So let us put 


to = 3*(z), Wt' 


r, 8 = 0, 1. 


then by (36) vh “ 2 *p*w*“*«>‘” and by Leibnitz formula, provided k 9 ^ 0, 

1 iX ~~ 1)1 ( 1 ) 

VOrZ Pk jLr —r-i- - -; w)/. 

Ir<-r-l Foil'll • • • F»! 
r <20 


( 38 ) 



THE MULTimCATIVB PROCESS 


221 


The principal contribution to the integral in (37) will come from the term of 
(38) which has the largest size for z near a. K we put f = (z — a)*^* then to is 
regular at f 0 and so is the constant p* . Let’s assume that to, has a pole of 
order 2i' — 1 at f = 0 for v = 1, 2 , • • • r — 1, which is clearly true for v = 1. 
Then if s is the number of vi, V 2 , • • • Vk-i which are = 0, the order of the pole 
of the general term of (38) at f 0 is 

22 (2vi — 1 ) + 8 + 2 i'i + 1 = 2 (r — Vo) — (Jc — s), 

*•-1 


which has the maximum value 2 r — 1 if and only if vo = vi = • • • vt_i = 0 , 
Vfc = r — 1. Hence 

(39) tOr = z®pjtto*“'toiii + f*‘"‘“'^^i(f), 

where J3l(f) is a regular function of f at zero. For fc = 0 the formula (38) is not 
correct but it is easy to see directly that (39) is correct for fc = 0. If we derive 
(39) with respect to z and put r — 1 for r we obtain 

= s?pkw'‘ 

hence 

Wr = 


Substituting in (37) and estimating in a manner similar to that employed 
previously we obtain 


E(n-'Vm 


iVkct-y ,, f (z - - aY'^ ) 

2vin'p„ Jrz"~*^* Jt 2vin^PnZ*'^^ 


= (p,a‘-‘)' ^ (n-r)(n-r - 1 ) (n - 2 r + 1 ) ^ 


and finally 

lim E{rr^Yl/%) - (ap*o*-‘)'. 

n-*oo 

The limit of the rth moment is itself the rth moment of the distribution on the 
real line which has all its mass at the point aptO*~‘. Since this distribution is 
uniquely determined by its moments, a well known theorem [7] enables us to 
conclude that our sequence of distributions has this distribution as limit and 
this is equivalent to what is claimed by the theorem. 

It is important to notice that if we put the mass ap*a*~* at the point k this 
determines a distribution on the real line because of (26). 


8. The estunation of p. If we wish to estimate p when we know p 0 , we 
may obtain an estimate from the knowledge of /(lo) in 0 < to < 1 , using the 
method of iteration. That is we choose a function G{w) such that G(p) = p 
and 1 Cr(«)) — p1<1w — p|for0<«><l. Then if for any m>o in the open 



222 


RICHARD OTTER 


interval we compute successively Wi,Wi, • • • , where u>„+i * G(w„) for n > 0, 
we are sure that w„ converges exponentially to p as n —» <». 

Obviously f(w) itself has the properties of G(w) but we achieve faster con¬ 
vergence towards p using Newton’s method, that is if we put 


(40) 


= f(w) - %o, 

Mw) 


G(w) = w — 




If for some reason we expect p to be close to 1 then it is better to put 


fiiw) 


_ f(w) — w 
u; — 1 


and use fi(w) in (40) instead of fi(w), for then we may choose ii>o = 1. 
Let us put /'(I) = 1 + e, « > 0 then 


Ui + H) . _ 1 


f 2 (l + h) = lim r 
*—0 \ 

fid) = lim 


E, h-* 0 -, 

'/(I + h-\-k)- I fH+h) - I 


k(h + k) kh 

'/(I + 2 h) - 2/(1 + h) + 

2/i* 


)• 

_i^ _ ni) 


Hence 

(41) 


p » = 1 _ 


2e 


/"(I) 


This result was previously established by Kolmogoroff [7]. 

The following two simple examples display the results of the general theory. 
Example 1. We take /(to) = ?% + piw + psta* where p# + pi + p* =» 1 
and po, P 2 > 0. We have p = «>. From the equations (26) and (27), 


/(o) = Po + Pio + Pao* = 

a 

f'{a) = Pi + 2 p 20 = i, 
a 


we obtain easily 

o = V Popp, = Pi + 2\/P 0 P 2 , 

and it is evident that o > 1 is equivalent to po > p* is equivalent to /'(I) =» 
Pi + 2p2 < 1. Now 

@(«, ta) = gpo + (api — l)ta + zpjta*. 



THE MULTIPLICATIVE PK0CE8S 


223 


hence 


_ 1 — gPl — V^(l — gpi)^ — 4g*PDp2 


the choice of the sign of the radical being determined by letting z -*0. 

p _ Po + P 2 - V(po - Pi)* ^ j ^ ’ Po > Pi; 

2p* [ Popr‘, po<pi- 

In the case > 0 we have ^ == 1 and then by (21) 

^ (ti l )! ^P2 

Pn = JL, —j i, rPo"PlP2% 

vo+^l+^a-n Vq\Vi\V 2 \ 
i' 1 + 2 p 2*"»“1 

which can also be obtained by expansion of (42) according to powers of z. From 
(29) we get 

Pn ~ ^^ (PiVpopi^ + 2 pop 7 ‘) (Pi + 2 \/po Pi)" n • 

In the case pi = 0 we have q = 2 and obtain from (42) or from (29) 


nz) = £(- i)"'‘(^)2*'-‘pSpr^ 

_ *1^ (2j^ — 2)! 

— ”77-fTi Po P2 2 > 

»-i vl{v — 1)1 


which shows 


(2v - 2)! 
vK.' - 1)!^“^* ’ 


n =2 i>; 


n = 2»' — 1. 


By direct use of Stirling’s formula or from (29) we get 


5 1 ./2 

!»-l ~ A/ ~ 

P2 V r 


2^\PoV2)\2v - 1)’'*. 


Example 2. We take /(id) = X > 0, so that IF has a Poisson distribu¬ 

tion. Then p = «>,q — 1, and we get from (26) and (27) 

/(o) = = a/a, 

f(a) = Xe'‘““” = 1/a, 


a = 1/X, 


« = V^- 


Clearly we have a > 1 if and only if X < 1 and in this case 1 is evidently the 
only solution for w of = w, hence p = 1. On the other hand if X < 1 



224 


RICHARD OTTER 


then (41) gives p «>■ 1 — 2(X — 1)X *. By (21) we get 

„ _ (nX)" * 

- ^ ® ’ 


and by direct use of Stirling’s formula or from (29) we get 


Pn 



yn-l _ 


REFERENCES 

[1] R. A. Fisher, The Genetical Theory of Natural Selection^ Oxford, The Clarendon Press 

1930. 

[2] A. J. Lotka, Theorie Analytique des Associations Biologiques 2, Hermann and Co., 

Paris, 1939. 

[3] D. Hawkins and S. Ulam, Theory of Multiplicative Processes I, MDDC-287, 1944. 

[4] T. E. Harris, “Some theorems on the Bernoullian multiplicative process,’* thesis, 

doctor of philosophy, Princeton University, 1947. 

[51 A. M. Yaglom, “Certain limit theorems of the theory of branching random processes,” 
Doklady Akad. Nauk. SSSR(N. S.) Vol. 66 (1947), 795-798. 

[6] D. G. Kendall, “On the generalized “birth-and-death” process,” Annals of Math, 

Stat., Vol. 19 (1948). 

[7] A. Kolmogoroff, “Zur Losung einer biologischen Aufgabe,” Mitt, Forsch,- Inst. Math. 

u, Mech. Univ. Tomsk, Vol. 2 (1938), pp. 1-6. 

[8] L. PoNTRJAGiN, Topological Groups, Princeton Univ. Press, 1946. 

[9] J. VON Neumann, Functional Operators, mimeographed notes. Institute for Advanced 

Study, 1933-35. 

[10] A. Kolmogoroff, Grundhergriffe der Wahrscheinlichkeitsrechnung, Chelsea Publishing 
Co., New York, 1946. 

(llj A. Hurwitz and R. Courant, Funktionentheorie, Springer, Berlin, 1929. 

[12] M. G. Kendall, The Advanced Theory of Statistics, Vol. I, Griffin Co., London, 1943. 



APPLICATION OF THE RADON-NIKODYM THEOREM TO THE 
THEORY OF SUFFICIENT STATISTICS^ 


By Paul R. Halmos* and L. J. Savage 
University of Chicago 

Sumnuuy. The body of this paper is written in terms of very general and 
abstract ideas which have been popular in pure mathematical work on the theory 
of probability for the last two or three decades. It seems to us that these ideas, 
so fruitful in pure mathematics, have something to contribute to mathematical 
statistics also, and this paper is an attempt to illustrate the sort of contribution 
we have in mind. The purpose of generality here is not to solve immediate 
practical problems, but rather to capture the logical essence of an important 
concept (suflBcient statistic), and in particular to disentangle that concept from 
such ideas as Euclidean space, dimensionality, partial differentiation, and the 
distinction between continuous and discrete distributions, which seem to us 
extraneous. 

In accordance with these principles the center of the stage is occupied by a 
completely abstract sample space—that is a set X of objects x, to be thought 
of as possible outcomes of an experimental program, distributed according to an 
unknown one of a certain set of probability measures. Perhaps the most familiar 
concrete example in statistics is the one in which X is n dimensional Cartesian 
space, the points of which represent n independent observations of a normally 
distributed random variable with unknown parameters, and in which the 
probability measures considered are those induced by the various common 
normal distributions of the individual observations. 

A statistic is defined, as usual, to be a function T of the outcome, whose 
values, however, are not necessarily real numbers but may themselves be abstract 
entities. Thus, in the concrete example, the entire set of n observations, or, 
less trivially, the sequence of all sample moments about the origin are statistics 
with values in an n dimensional and in an infinite dimensional space respectively. 
Another illuminating and very general example of a statistic may be obtained as 
follows. Suppose that the outcomes of two not necessarily statistically inde¬ 
pendent programs are thought of as one united outcome—then the outcome T 
of the first program alone is a statistic relative to the united program. A 
technical measure theoretic result, known as the Radon-Nikodym theorem, is 
important in the study of statistics such as T. It is, for example, essential 
to the very definition of the basic concept of conditional probability of a subset 
E otX given a value y of T. 

The statistic T is called sufficient for the given set SDl of probability measures 

* This paper was the basis of a lecture delivered upon invitation of the Institute at the 
meeting in Chicago on December 30, 1947. 

* Fellow of the John Simon Guggenheim Memorial Foundation. 

225 



226 


PAUL R. HALM08 AND L. J. SAVAGE 


if (somewhat loosely speaking) the conditional probability of a subset E of .Y 
given a value i/ of T is the same for every probability measure in SK. It is, for 
instance, well known that the sample mean and variance together form a sufficient 
statistic for the measures described in the concrete example. 

The theory of sufficiency is in an especially satisfactory state for the case 
in which the set 9K of probability measures satisfies a certain condition described 
by the technical term dominated. A set 501 of probability measures is called 
dominated if each measure in the set may be expressed as the indefinite integral 
of a density function with respect to a fixed measure which is not itself necessarily 
in the set. It is easy to verify that both classical extremes, commonly referred 
to as the discrete and continuous cases, are dominated. 

One possible formulation of the principal result concerning sufficiency for 
dominated sets is a direct generalization to the abstract case of the well known 
Fisher-Neyman result: T is sufficient if and only if the densities can be written as 
products of two factors, the first of which depends on the outcome through T 
only and the second of which is independent of the unkno\vn measure. Another 
way of phrasing this result is to say that T is sufficient if and only if the likelihood 
ratio of every pair of measures in 2)? depends on the outcome through T only. 
The latter formulation makes sense even in the not necessarily dominated case 
but unfortunately it is not true in that case. The situation can be patched up 
somewhat by introducing a weaker notion called pairwise sufficiency. 

In ordinary statistical parlance one often speaks of a statistic sufficient 
for some of several parameters. The abstract results mentioned above can 
undoubtedly be extended to treat this concept. 

1. Basic definitions and notations. A measurable space (Y, S) is a set 
and a <r-algebra S of subsets of Y.* If (Y, S) and (F, T) are measurable 
spaces and if T is a transformation from Y into Y (or, in other words, if T 
is a function with domain Y and range in F), then T is measurable if, for every F 
in T, Tr^{F) € S. If F is a Borel set in a finite dimensional Euclidean space, 
then we shall always understand that T is the class of all Borel subsets of F, 
and the measurability of a function / from Y to F will be expressed by the 
notation /(c) S. 

Throughout most of what follows it will be assumed that (Y, S) and (F, T) 
are fixed measurable spaces and that T is a measurable transformation (also 
called a statistic) from Y onto Y. A helpful example to keep in mind is the 
Cartesian plane in the role of Y, its horizontal coordinate axis in the role of F, 
and perpendicular projection from Y onto F in the role of T. 

The following notations will be used. If g is a point function on F (with 
arbitrary range), then gT is the point function on Y defined by gT{x) = g{T{x)). 
If /X is a set function (with arbitrary range) on S, then is the set function 

* A <r-algebra is a non empty class S of sets, closed under the formation of complements 
and countable unions. If (A, S) is a measurable space, the sets of S will be called the 
measurable sets of X. 



SUFFICIENT STATISTICS 


227 


on T defined by nT'^^(F) = n(T~^(F)). The class of all sets of the form T^^{F), 
with F € T, will be denoted by T~^(T); the characteristic function of a set A 
(in any space) will be denoted by • 

Lemma 1. If g is any f unction on Y and A is any set in the range of g, then 

[x:gT{x)eA] = T'Wy: g{y) e A}y, 

hence, in particidar, xr-Hn = XrT for every subset F of Y.^ 

Proof. The following statements arc mutually equivalent: (a) Xo € 
{a:: gT(x)€A\, (b) g{T{xo))€Ay (c) if t/o = T(xo), then g{yo)eA, and (d) 
T(xo) € {y: g{y)tA\. The equivalence of the first and last ones of these 
statements is exactly the assertion of the lemma. 

We shall have frequent occasion to deal with functions on X which are induced 
by measurable functions on Y ; the following result is a useful and direct structural 
characterization of such functions. 

Lemma 2. If f is a real valued function on X, then a necessary and sufficient 
condition that there exist a ineasurable function g on Y such that f = gT is that 
f (€) if such a function g exists, then it is unique.^ 

Proof. The necessity of the condition is clear. To prove sufficiency, 
suppose that / (€) T~^{T), yo c Y, and write Xo = T~^({yo\), Suppose To e Xo 
and write E = [x: f(x) = /(xo)} . Since / (c) 'F'^T), there is a set F in T such 
that E = T^^(F), Since Xo € E, it follows that yo^F and therefore that 

Xo = r-Wyo]) C T'\F) = E. 

In other words / is a constant on Xo and consequently the equation g{yo) = /(xo) 
unambiguously defines a function g on F. The facts that f — gT and that g is 
measurable are clear; the uniqueness of g follows from the fact that T maps 
X onto Y, 

2. Measures and their derivatives. A measure is a real valued, non negative, 
finite (and therefore bounded), countably additive function on the measurable 
sets of a measurable space.® An integral whose domain of integration is not 
indicated is always to be extended over the whole space. If the symbol 
\p], pronounced ‘‘modulo m”, follows an assertion concerning the points x of 
X, it is to be understood that the set E of those points for which the asser- 
. tion is not true is such that E € S and ix{E) = 0. Thus, for instance, if / 
and g are functions (with arbitrary range) on X, then f — g\y] means that 

^ The symbol {— : —1 stands for the set of all those objects named before the colon 
which satisfy the condition stated after it. 

• The notation/ {t) means of course that/is a measurable function not only on the 

measurable space (X, S) but also on the measurable space (X, T'^^(T)), The restriction to 
real valued functions is inessential and is made only in order to avoid the introduction 
of more notation. 

• Although most of the measures occurring in the applications of our theory are probability 
measures (i.e. measures whose value for the whole space is 1), the consideration of probabil¬ 
ity measures only is, in many of the proofs in the sequel, both unnecessar}* and insufficient. 



228 


PAUL R. HALMOS AND L. J. SAVAGE 


f(x) ^ fir(ar)}) « 0. Similarly, if / is a real valued function on X, then 
/ (e) r”^(T) [fi] means that there exists a real valued function g on X such 
that g (c) T^^(T) and/ = g [m]. 

If IX and V are two measures on S, v is absolutely continuous with resf ect to /x, 
in symbols v <3C /x, if v{E) = 0 for every measurable set E for which ix{E) = 0. 
The measures n and v are equivalent^ in symbols ix = v/\{ simultaneously y. <^v 
and V One of the most useful results concerning absolute continuity is the 

Radon-Nikodym theorem, which may be stated as follows.® 

A necessary and sufficient condition that v y is that there exist a non negative 
function f on X such that 

v{E) = f fix) dy(x) 

J ES 

for every E in S. The function f is unique in the sense that if also 

v(E) = f g(x) du(x) 

J E 

for every E in S, then f = If v{E) ^ u{E) for every E in S, then 0 ^ f{x) 

It is customary and suggestive to write / = dvfdu. Since dv/dy. is determined 
only to within a sipt for which y vanishes, it follows that in a relation of the form 

^ (e) T\T) [y] 

ay 

the symbol [y] is superfluous and may be omitted. 

For typographical and heuristic reasons it is convenient sometimes to write the 
relation / == dv/dy in the form dv = fdy; all the properties of Radon-Nikodym 
derivatives which are suggested by the well known differential formalism cor¬ 
respond to true theorems. Some of the ones that we shall make use of are 
trivial (e.g. dvi == fidy and dv 2 = f 2 dy imply divi + V 2 ) = (/i + / 2 )c?/x), while 
others are well kno^vn facts in integration theory (e.g. (i) d\ = fdp and dv = gdy 
imply d\ = fgdy, and (ii) dv = fdy and dy = gdv imply /gf = 1 [y]). 

We conclude this section with a simple but useful result concerning the 
transformations of integrals. 

Lemma 3. If g is a real valued function on Y and y is a measure on S, then 

f g{y)dyTr\y)= /*_J gT{x)dy{x) 

Jf Jt (f) 

for every F in T, in the sense that if either integral exists^ then so does the other and 
the two are equal. 


^ It is clear that the relation of equivalence is reflexive, symmetric, and transitive, 
and hence deserves its name. 

‘ For a proof of the Eadon-Nikodym theorem and similar facts concerning the measure 
and integration theory which we employ, see S. Saks, Theory of the Integral^ Warszawa-^ 
Lwdw, 1937. 



SUFFICIENT STATISTICS 


229 


Proof. Replacing g hy gxr we see that it is sufficient to consider the case 
= F. The proof for this case follows from the observation that every ap¬ 
proximating sum 

Si g{yi)^'r\F,) 


of 



is also an approximating sum 


Xi gT(xMEi) 


of J gTdfi, and conversely.® 

3. Conditional probabilities and expectations. Lemma 4. If n and v are 

measures on S such that v <K fi, then vT~^ 

Proof. liF eT and 0 = ixT'\F) = nifT^F)), then 

0 = p{T^\F)) = vT^\F)}^ 

Lemma 4 is the basis of the definition of a concept of great importance in 
probability theory. If m is a measure on S and / is a non negative integrable 
function on X, then the measure v defined by dv = fdix is absolutely continuous 
with respect to n. It follows from Lemma 4 that is absolutely continuous 
with respect to we write dpT'^ = gdnT~^. The function value g(y) is 

known as the conditional expectation of / given y (or given that T{x) = y) ; we 
shall denote it by e„(J 1 i/). If / = is the characteristic function of a set in 
S, then e^ij | y) is kno^^^l as the conditional probability of E given y; we shall 
denote it by p^(E | y)}^ 

The abstract nature of these definitions makes an intuitive justification of 

them desirable. Observe that since = v{T ^{F)) = f f(x)dii(x), 

Jt (f) 

the defining equation of e^(f | y), witten out in full detail, takes the form 

f(x)dy{x)= f e,(f\y)df,r'\iA FeT. 

Jt Jf 


• It is of interest to observe that either side of the equation in Lemma 3 may be obtained 
from the other by the formal substitution y = T(x), A special case of this lemma is the 
celebrated and often misunderstood assertion that the expectation of a random variable is 
equal to the first moment of its distribution function. 

That the converse of Lemma 4 is not true is shown by the following example. Let X be 
the unit squ%re, let Y be the unit interval, and let T be the perpendicular projection from 
X onto F. Let m be ordinary (Borel-Lebesgue) measure and let v be linear measure on the 
intersection of X with, say, the horizontal line whose ordinate is J. Clearly v is not abso¬ 
lutely continuous with respect to /*, but 

“ Definitions in this form were first proposed by A. Kolmogoroff, Grundbegriffe der 
Wahracheinlichkeitsrechnung, Berlin, 1933. With a slight amount of additional trouble, 
conditional expectation could be defined for more general functions, but only the non 
negative case will occur in our applications. 



230 


PAUL B. HALUOS AND L. J. SAVAGE 


If / ■= X*»then this equation becomes the defining equation of | y): 

n T\F)) = p,(E I y) dpT^y), F tT 

The customary definition of “the conditional probability of E given that T(x) t F" 
is n(E n T~^(F))/n(T~^{F)), (assuming that the denominator does not vanish). 
Since ixiT^F)) = ^T-\F), we have 

p{EnT-\F)) _ 1_ f , rp-i 

uiT-^F)) uT~KF) L ' 

It is now formally plausible that if “F shrinks to a point y,” then the left side 
of the last written equation should tend to the conditional probability of E 
given y and the right side should tend to the integrand p^iE | y). The use of 
the Radon-Nikodym differentiation theorem is a rigorous substitute for this 
rather shaky difference quotient approach. 

Since Pf,(E | y) is determined, for each E, only to within a set for which 
vanishes, it would be too optimistic to expect that, for each y, it behaves, regarded 
as a function of E, like a measure. It is, however, easy to prove that 

(i) V.(X I y) = 1 \pT-\ 

(ii) 0 g p,{E \y)^l [mT-*], 

(iii) if {£nj is a disjoint sequence of measurable sets, then En\y) = 

TTn-i P.iE^\y)[p'rV 

The exceptional sets of measure zero depend in general on E in (ii) and on the 
particular sequence {jSn} in (iii). It is interesting to observe that, despite the 
fact that M need not be a probability measure, turns out always to have the 
normalization property (i). It is natural to ask whether or not the indeterminacy 
of Pfi{E I y) may be resolved, for each JS, in such a way that the resulting function 
is a measure for each y, except possibly for a fixed set of y^s on which /x7^^ 
vanishes. Doob^^ has shown that this is the case when X is the real line; in the 
general case such a resolution is impossible. Fortunately, however, conditional 
probabilities are sufficiently tractable for most practical and theoretical purposes, 
and the requirement that they should behave like probability measures in the 
strict sense described above is almost never needed. 


We observe that it is not sufficient to require this for F =* Y only, i.e. to require 
m(F) « S I y) dfiT~^iy). This special equation is satisfied by many functions which 
do not deserve the name conditional probability; e.g. it is satisfied by Pn(E | y) =■ 
constant =*« n{E)/fiT~^(y), 

See J. L. Doob, “Stochastic processes with an integral-valued parameter,” Am, Math, 
Soc, Trans,, Vol. 44 (1938), pp. 95-98. 

See Doob, he. cit, Doob asserts the theorem in much greater generality, but his 
proof is incorrect. The error in the proof and a counterexample to the general theorem 
were communicated to us by J. Dieudonn4 in a letter dated September 4, 1947. Doob’s 
proof is valid for more general spaces than the real line (e.g. for finite dimensional Euclidean 
spaces and for compact metric spaces). The details of Dieudonn4’s counterexample will 
appear in a forthcoming book (entitled Measure theory) by Halmos. 



SUFFICIENT STATISTICS 


231 


We conclude this section with two easy but useful results which might also 
serve as illustrations of the method of finding conditional probabilities and 
expectations in certain special cases. 

Lemma 5. If ^jl is a measure on S, if g is a non negative function on Y, integrable 
with respect to and if v is the measure on S defined by dv = gTdu, then 
dvT~^ = gdfiT^\ or, equivalently, e^{gT 1 y) = g(y) 

Proof. From v{E) = / gT(x) dii(x) and Lemma 3 it follows that 

./e 

vT-\F) = v(r\F)) = f g{y) di^T^iy). 

Lemma 6. If n is a measure onS,if f and g are non negative functions on X and 
Y respectively, and if f, gT, and f • gT are all integrable with respect to p, then 

e,(f-9 T\y) = e,(f\y)-g(y)\p'r% 

Hence, in particular, if F eT, then 

p,{E n T\F) I y) = p,{E 1 y)xM [pT^] 

for every E in S. 

Proof. If dv = fdp, then, by definition of e„ , vT~^{F) = J c„(/1 y) dpT~^{y), 
Applications of Lemmas 3 and 5 yield 

f e,U\y)9iy)dpT-^{y) = / 9(y)dpT^(y) = / , 9nx)dv(x) 

Jr Jr Jr-i(r) 

= f mgT{x) dM(x) = / e,{f-gT | y) dpT\y), 

and therefore the desired conclusion follows from the uniqueness assertion of the 
Radon-Nikodym theorem. 

4. Dominated sets of measures. In many statistical situations it is necessary 
to consider simultaneously several measures on the same <r-algebra. The 
concept of absolute continuity is easily extended to sets of measures. If Wl 
and 91 are two sets of measures on S and if, for every set S in S, the vanishing of 
piE) for every p m^ffl implies the vanishing of v{E) for every v in 91, then we 
shall call 91 absolutely continuous with respect to 2)1 and write 91 <3C 2)1. If 
91 <5C 2)1 and 2)1 91, the sets 2)1 and 91 are called equivalent and we write 2)1 = 91. 

If, in particular, 2)1 contains exactly one measure 501 = {#»}, the abbreviated 
notations 91 jtx, M <3^ 91, and p s 91, will be employed for 91 <5C 2)1, 2)1 91, and 

2)1 s 91, respectively. 

A set 2)1 of measures on S will be called dominated if there exists a measure X 
on S (not necessarily in 20) such that 20 X. In applications there frequently 
occur sets of measures which are dominated in a sense apparently weaker than 
the one just defined—^weaker in that the measure X, w'hich may for instance be 



232 


PAUL B. HALMOS Am> L. J. SAVAGE 


Lebesgue measure on the Borel sets of a finite dimensional Euclidean space, 
is not necessarily finite. It is easy to see, however, that whenever X has the 
property (possessed by Lebesgue measure) that the space X is the union of 
countably many sets of finite measure, then a finite measure equivalent to X 
exists and the two possible definitions of domination coincide. 

The following result on dominated sets of measures may be found to have 
some interest of its own and will be applied in the sequel. 

Lemma 7. Every dominated set of measures has an equivalent countable subset. 

Proof. Let SD? be a dominated set of measures on S, SO? <3C X; for any m in SR 
write /„ = dy/dh and = [x'. f^(x) > 0). We define (for the purposes of this 
proof only) a kernel as a set fiT in S such that, for some measure ym^,K CZK^ 
and niK) > 0; we define a chain as a disjoint union of kernels. Since XiK) > 0 
for every kernel K, it follows from the finiteness of X that every chain is a countable 
disjoint imion of kernels. It follows also from these definitions that if C is a 
measurable subset of a chain, such that ix(C) > 0 for at least one measure ju in 9R, 
then C is a chain, and that a disjoint union of chains is a chain. The last two 
remarks imply, through the usual process of disjointing any countable union, 
that a countable (but not necessarily disjoint) union of chains is a chain. 

Let \Cj\ be a sequence of chains such that, as j «, X(C;) approaches 
the supremum of the values of X on chains. If C = U"_i C/ , then C is a chain 
for which X(C) is maximal. The definition of a chain yields the existence of a 
sequence {Xj} of kernels such that C = U“«i Ki , and the definition of a kernel 
yields the existence, for each i = 1, 2, • • • , of a measure in SR such that 
Ki C K^. and yiiKi) > 0. We write R = {yi, fh, • • •); since R C 9D?, the 
relation R <5C 9R is trivial. We shall prove that 5R <5C R. 

Suppose that E (S, yi(E) = 0 for f = 1, 2, • • • , and let y be any measure in 
. 9R. It is to be proved that y(E) — 0. Since y(E — K^) = 0, there is no loss of 
generality in assuming that E CZ K^. If y(E — C) > 0, then \(E — C) > 0 
and therefore (since E — C is a kernel) E u C is a chain with X(E u C) > \(C). 
Since this is impossible, it follows that y{E — C) = 0. Since 0 = yi{E) =« 

yi(E n Ki) = I f^idX and since Ki C Km , it follows that X(E n Ki) = 0. 

JsnXi 

We conclude that X(E n C) = therefore y(E n C) => 0. 

Since y(E) = y(E — C) + y{E n C), the proof of the lemma is complete. 

6. Sufficient statistics for dominated sets. The statistic T is sufficient for a 
set 9R of measures on S if, for eveiy E in S, there exists a measurable function 
p = p(E I y) on Y, such that 

p,(E I y) = piE 1 y) [yf-^] 

for every y in SR.** In other words, T is sufficient for SR if there exists a condi- 

** The original definition of sufficiency was given by R. A. Fisher, “On the mathematical 
foundations of theoretical statistics,” Roy. Soc. Phil. Trans., Series A, Vol. 222 (1922), 
pp. 309-308. 



SUFFICIENT STATISTICS 


233 


tional probability function common to every m in 9K, or, crudely speakmg, if the 
conditional distribution induced by T is independent of n. 

Theorem 1. A necessary and sufficient condition that the statistic T be sufficient 
Jor a dominated set fOl of measures on S is that there exist a measure \onS such that 
SO? s X and such that du/d\ («) T~'(T) for every a in 39?. 

Proof of necessity. Let 9? = {w » w , • • •} be a countable subset equivalent 
to 99? (Lemma 7), and write X for the measure on S defined by 

HE) = Z7-iaME), 

where o< = 1/2V»(-^)) i = 1, 2, • • • . Clearly 99? = X. 

If p is a conditional probability function common to every u in 99?, then, for 
every F in T, 

\(E n T-\F)) = Zr-i aME n 1^\F)) 

= Lr-i «.• I PiE 1 y) d^;T-\y) = p{E I y) d\T-\y), 

i.e. p serves also as a conditional probability for X. 

Take any fixed u in 99?, write dy/dX = /, and e\(f \ y) = g{y); then dftT~^ = 
yd\T~^, and we have, for every E in S, 

jj{x) dX(.T) = p(J5) = fp{E\ y) dpT-\y) 

= I V(E\ y)g{y) dhT ^y) = J ex(x, 1 y)e,(gT | y) dXT-^y) 

= / e,(x.-gT 1 y) d\T-\y) = J x>ix)gT(x) d\(x) = JjTix) dX(x). 

The desired result, f{x) = gT(x) [X], follows from a comparison of the first and 
last terms in the last written chain of equations. 

Proof of Sufficiency. We shall prove that px is a conditional probability 
function common to every y in 99?. Take any fixed E in S and m in 21? and 
write du/dX = gT. If the measure v is defined by dv = xedy, then dvT~^ - 
p^dyT~^, where p„ = p^(E \ y). The hypothesis dy = gTdX implies that 
dyT~^ = gdXT~^ and hence that 

dvT~^ = p^gdyr\ 

On the other hand dv = xsdy = xs'gTdX, so that 

dvT~^ = exdXT'\ 

where cx = ex(xB’9 T\y) = P\{E\y)g(y). It follows from a comparison of 
the two expressions for dvT~'^ that 


P.(.E\y)g(y) = px(.E\y)g{y)[XT-^' 



234 


PAUL R. HALMOS AND L. J. SAVAGE 


Since the relation = gdKT'^ clearly implies that g{y) 7 ^ 0 (i.e. 

that g{y) = 0 }) = 0 ), it follows, finally that 

V.iE\y)^V^{E\y) 

6. Special criteria for sufficiency. Theorem 1 may be recast in a form more 
akin in spirit to previous investigations of the concept of sufficiency.^® 

Corollary 1 . A necessary and sufficient condition that the statistic T be 
sufficient for a dominated set (< 5 C Xo) of measures on S is that, for every /x in 9K, 
f^ = d/x/dXo be factorable in the formf„ == g^^-t, where 0 ^ g^ (e) T~^(T), 0 ^ t, t 
and g^'t are integrable with respect to Xo, and t vanishes [Xo] on each set in S for 
which every u in 9)? vanishes. 

In more customary statistical language the condition asserts essentially that 
“each density is factorable into a function of the statistic alone and a function 
independent of the parameter.” 

Proof. If T is sufficient for 9K, then there exists a measure X with the- 
properties described in Theorem 1 . It follows that 

- ^ dfjL _ dp dk 
^ dXo dX dXo 

and we may write g^ = d^u/dX and t = dX/dXo. The only assertion that is not 
immediately obvious is the one concerning the vanishing of t. To prove it,, 
suppose that piE) = 0 for every m in 9 )?; the fact that then 

0 = X(£) = f t{x) dXo(x) 

Je 

implies the desired conclusion. 

If, conversely, f^ = g^-t, then we may write dX = tdko . The relation gjj s X 
follows from the statement concerning the vanishing of t, and the relation 
d/i/dX (c) T^^(T) is implied by the equation d/x = g^-td\o = g,id\. 

For the statement of the next consequence of Theorem 1 it is convenient to 
call a set 9K of measures on S homogeneous if /x s y for every /x and v in 91i. 

Corollary 2. A necessary and sufficient condition that the statistic T be 
sufficient for a homogeneous set 9)? of measures on S is that, for every /x and v in 9W^ 
dv/du (€) 

Proof. Since a homogeneous set is dominated (by any one of its elements). 
Theorem 1 is applicable. If T is sufficient for SSR and if X has the properties 
described in Theorem 1, then dv/du == (di'/dX)/(d/x/dX). The converse follows, 
through Theorem 1, by letting X be any measure in 9K. 

We shall say that the statistic T is pairwise sufficient for a set 9)? of measures 

See J. Neyman, “Su un teorema concernente le cosiddette statistiche sufficienti,” Inst, 
llal, Atii. Giorn.f Vol. 6 (1935), pp. 320-334. In this paper Neyman is somewhat restricted 
by his use of classical analytical methods, but he points out the possibility and desirability 
of extending his results to a much more general domain. For a recent presentation of the 
theory and further references to the literature cf. H. Cram4r, Mathematical Methods of 
Statistics, Princeton, 1946. 




SUFFICIENT STATISTICS 


235 


on S if it is sufficient for every pair {/*, j'} of measures in SW. In other words, 
T is pairwise sufficient for 9K if, for every ^ in 5 and ix and v in 9W, there exists a 
measurable function | y) on K such that 

P^{E 1 y) = p^(E 1 y) [nT~^] and py(E j y) = p^{E j y) [vT~\ 

Since pairwise sufficiency is (at least apparently) weaker than sufficiency, it is 
not surprising that there is a simple criterion for it even in the case of quite 
arbitrary (not necessarily homogeneous or dominated) sets of measures. 

Corollary 3. A necessary and sufficient condition that T be pairwise sufficient 
for a set 2)? of measures on S is thaty for any two measures m and v in dn/d{u + v) 
(0 T^\T), 

Proof. If T is sufficient for y. and r, then there exists a measure X = /x + j/ 
such that dy/dk (c) T~^{T) and dv/d\ (c) It follows that 

dy _ d/i / d{y v) ^ dy / fdy . 
dOu + v) d\ / dX d\ / \dX dX/ 

The sufficiency of the condition follows immediately by applying Theorem 1 
to the two-element set {/i, v]- 

7. Pairwise sufficiency and likelihood ratios. It is sometimes convenient to 
express the result of Corollary 3 in slightly different language. If X is a measure 
on S and if / and g are real valued measurable functions on X such that 
X({x:/(a:;) == g(x) = 0}) = 0, we shall say that the pair (/, g) is admissible [X]. 
(Intuitively an admissible pair (/, g) is to be thought of as a ratio f/g, which, 
however, may not be formed directly at the points a: for which g{x) =0.) Two 
admissible pairs (/i, gO and (/ 2 , g 2 ) will be called equivalent [X], in symbols 
Cfi , ^i) ~ if 2 , g^) [k], if there exists a real valued measurable function t on X 
such that t(x) 7^ 0 [k] and such that/i = tfz and gi = tgt [X]. It is clear that the 
relation ‘‘ = [X]” is indeed an equivalence; the equivalence class containing the 
admissible pair (/, g) will be called the ratio of / and g and will be denoted by 
/ 1 g, (A ratio may accordingly be described as a measurable function from X 
to the real projective line.) For a ratio /1 g we shall wTite / 1 ^ (e) T~^(T) [X] 
if the equivalence class /1 g contains a pair (fo , ^o) which is admissible [X] and 
for which fo (c) T-\T) and go (c) T^\T). 

Lemma 8. If y, v, Xi , and X 2 are measures on S such that y + « Xi and 

y + V X 2 , then the pairs (dy/dki , dv/dki) and {dy/dko , dv/dk^) are admissible 
[y + v] and equivalent [y v]. 

Proof. The admissibility of, for instance, {dy/dki , dv/dki) follows from the 
fact that dy/dki ^ 0 [/x] and dv/dki 5 ^ 0 [v], whence 

To prove equivalence, we write Xi + X 2 = X. Since 

dy dki _ dy _ dv dki __ dv _ dv dX^ 

dki dk dk dk2 dk ’ dXi dk dk dk2 dk ’ 


236 


PAUL B. HALMOS AND L. J. SAVAGE 


since also dKi/dk 9 ^ 0 [Xj and therefore dKi/dK 9 ^ 0[n + v], and since, similarly, 
dXj/dX 5 ^ 0 [/I + v], the conditions of the definition of equivalence are satisfied 
by t = (dX 2 /dX)/(dXi/dX). 

If ft and V are any two measures on S and if X is any measure on S such that 
/X + r X (for instance if X = ^ + r), then the ratio dn/dK ] dv/dK, which 
according to Lemma 8 exists [/t + v] and is independent of X, will be called the 
likelihood ratio of n and v and will be denoted by dix j dv. The result of Corollary 
3 may be expressed in terms of likelihood ratios as follows. 

Theobem 2. A necessary and sufficient condition that T be pairwise sufficient 
for a set 9D? of measures on S is that, for any two measures /i and v in W, 
du i dv («) T-\n 

Proof. If T is sufficient for m and v, then, by Corollary 3, dy/diu + v) (t) 
ir'*(J), dv/d{y + v) («) T~\T), and, by Lemma 8, {dy/d(y + v), dv/diy + v)) is 
an admissible pair belonging to the equivalence class dy | dv. Suppose conversely 
that f = dy/d(y + v), 9 = dv/d(y + v), and let the real valued measurable 
functions t, fo , and go be such that t 9 ^ 0 \y + v], fn (e) T '^{T), go («) 

(jo, go) is admissible [/i + r], and 

f = t-fo, 9 = t-go [m + v]. 

Since / and g are non negative, it follows that / = 1 < | • i /o i and g = i < M 1 
[m + r], i.e. that there is no loss of generality in assuming that t, fo , and go are 
non negative. The relation f+g=l[y-\-v] implies that t-(fo + go) = 1 
[y + r]; the fact that (fo , go) is admissible U + r] then yields t e T~^(T). The 
proof is completed by comparing this result with the expressions for f and g in 
terms of fo and go and applying Corollary 3. 

8 . Pairwise sufficiency versus sufficiency. In order to show that our results 
on pairwise sufficiency (in the preceding section and in the sequel) are not 
vacuous, we proceed now to exhibit a statistic which is, for a suitable set of 
measures, pairwise sufficient but not sufficient. 

Let X = {(x, i): 0 ^ a: ^ 1, i = 0,1} be the union of two unit intervals and 
let y = {y: 0 ^ y ^ 1) be a unit interval. In accordance with our basic 
convention, measurability in both X and Y is to be taken in the sense of Borel. 
The statistic T is defined by T(x, i) = x. 

Write Xo = {(x, 0 ): 0 g a; ^ 1 } and Xi — {(x, 1 ): 0 ^ x ^ 1 }. Let yhe 
(linear) Lebesgue measure on the class S of Borel subsets of X, and define, 
whenever E eS and 0 ^ ^ 1 , 

i^aiE) = h\y(E n Xo) + 1)]. 

Let V be (linear) Lebesgue measure on the class T of Borel subsets of Y, and 
define, whenever F el and 0 ^ a ^ 1, 

VaiF) = hMF) + X,(«)]. 

Clearly va = yaT~^', we write SD1= 1 ). 



SUFFICIENT STATISTICS 


237 


If 5(y, a) is defined to be 1 or 0 according as y = a or j/ a, if d'iy, a) — 
1 — 8(y, a), and if 

P«(.E 1 y) = 8'(y, a)xE(y, 0) + 5(y, a)x»(y, 1), 
then a straightforward computation shows that 

y„{E n T-\F)) = p,{E I y) dvM, 

so that pa{E 1 y) = \ y) [»»„]. 

It is now easy to verify that T is pairwise sufficient for 9W. Indeed if a and 0 
are any two different numbers in the closed unit interval, we may write 

p{E 1 y) = S'(y, a)i'(y, /3)x*(y, 0) + [«(y, a) + «(y, /8)]x*(y, !)• 

Since {y: piE \ y) 9 ^ pa{E 1 y)} = {/3} and {y: p{E j y) 5 ^ Pf{E 1 y)} = {«}, it 
follows that p{E 1 y) = pa(.E \ y) [>-«] and p{E | y) = pfi{E [ y) [v^]. 

To prove that T is not sufficient for SU we observe that Pa(^i | y) = 
Ky, «)xxi(y, 1) = 8{y, a) and therefore 

PttJJ^i I y) =“ 5(y, a) [va]. 

Suppose that there is a conditional probability function p such that p(JB | y) = 
p^{E 1 y) (v«]. Then, in particular, 

P(-X^i I y) = «(y, a) [Fa]. 

Since Fa({a}) = ^ > 0, it follows that 

p(Xi 1 a) = 5(a, a) == 1, 

or, changing to a more suggestive notation, that p(Zi | y) = 1 for all y. We 
have, however, 

>'«({p:p<.(AMy) = 0}) = Fa({y:«(y, a) = 0}) 

= y ^ a\) = h 

so that VaiXy.PftSXi | y) = 0}) = This contradiction shows the impossibility 
of the existence of a conditional probability function common to every in 991. 

This example shows also that, in a sense, sufficiency is more fundamental 
than pairwise sufficiency. If, for instance, we imagine that it is important to a 
statistician that he either estimate a sharply or refrain from estimating it 
altogether, then he is by no means as well off with the observation of y as with 
that of X. 

9. Pairwise sufficiency for dominated sets. We now proceed to show that 
for dominated sets of measures no such e.xample as the one in the preceding 
section exists, or, in other words, that for dominated sets the concepts of painvise 
sufficiency and sufficiency do coincide. 



238 


PAUL R. HALMOS AND L. J. SAVAGE 


Lembia 9. If T is pairwise sufficient for a set {mo > Mi > An) of three measures 
on S, then" 

Proof. According to Corollary 3, 

Since dpo = /id(Mo + Mi) = /sd(Mo + ms), we have fidna = /i/sdCMo + Ms) and 
fiduo — /i/sd(Mo + Ml), so that 

(A + /s — /i/s)dMo = fifsdiuo + Ml + Ms). 

If Ave write dpo = /d(Mo + Mi + Ms), then it follows that 

(A + A “ AA)/ = AA [mo + Ml + Ms]- 

Since 0 ^ A ^ 1 aod 0 ^ A ^ 1, the equation A + A “ AA = 0 Is equivalent 
to A = A = 0- Since Mo({a::A(*) = A(^) = 0}) = 0, it follows that/ may be 
redefined, if necessary, to be 0 on the set {x:fi(x) — ft(x) = 0] without affecting 
the relation dpo = /d(Mo + Mi + Ms); since outside this set / = fjfi /(/i + A ~ AA)> 
the proof of the lemma is complete. 

Lemua 10. If T is painvise sufficient for a finite set {mo , Mi , • • • , M*} of 
measures on S, then dMo/d(2<-o M.) (*) T-\T). 

Proof. For fc = 1 the conclusion is a restatement of the hypothesis; we 
proceed by induction. Given mo , Mi, • • • , M*+i, we write u— Z^<-i M.- • Then 
dito/d(jM) + m) («) T~^{T) by the induction hypothesis and dpo/dtpo + m*+i) («) 
•r”‘(r) by Corollary 3. Lemma 9 may then be applied to {mo , M, Mt-i-i} and 
yields the desired conclusion. 

Lebima 11. If {mo , Ml , Ms • • •} is a sequence of measures on S such that 
22?-o UiiX) < ^]if, for every E in S, n(E) = ST-o ni{E)\ and if X is a measure 
S such that Hi for i = 0, 1,2, •• • , then 

lim k d(£i-o Hi)/dK = dp/dh [X]. 

Proof. Since 0 ^ ^(Sf-oM,)/^^ ~ 2<-o idui/dk) < dn/dX [X], the se¬ 
ries (dni/dX) does indeed converge to a measurable function / [X]. Since, 
for every E in S, 

Ifdk = Z?-o £^‘dX = = h{E), 

we have / == dn/dK [X], as stated. 


In view of Theorem 1, Lemma 9 asserts that if T is pairwise sufficient for a set 9M of 
three elements, then T is sufficient for SO'}. Lemmas 10 and 12 extend this result to finite 
and countably infinite sets $0} respectively. Since every countable set of measures is 
dominated, the final result. Theorem 3, contains all these preliminaries as special cases. 



SUFFICIENT STATISTICS 


239 


Lemma 12. If {ito, ixi, iht \ is a sequence of measures on S such that 
MX) < ond if, for every E in S, y{E) — M^), then 

lim k in) — dyolM \u]- 

If, in addition, T is pairwise sufficient for the sequence {m« , yi, • • •}, then 
dM («) T\T). 

Proof. We have, for & = 0, 1, 2, • • • , 

dfiQ _ dyo 

fit) dy dy 

If we write \ — y, then the hypotheses of Lemma 11 are satisfied and, con¬ 
sequently, the second factor on the left side converges to 1 [m] ; it follows that the 
first factor converges to dyo/dy [>t]. The second assertion of the lemma follows 
from Lemma 10. 

Theorem 3. A necessary and sufficient condition that T he sufficient for a 
dominated set 2W of measures on S is that T be pairwise sufficient for SD?. 

Proof. The necessity of the condition is obvious. To prove its sufficiency, 
let = {yi, yi, • • • 1 be a countable subset of SO? which is equivalent to 2)? 
(Lemma 7), and let yo be an arbitrary measure in 30?. Since the sufficiency or 
pairwise sufficiency of T remains unaltered if some or all of the measures in SO? 
are replaced by positive constant multiples of themselves, we may assume that 
fti(X) < °°. If we write, for every E in S, X(E) = yi{E), then the 
pairwise sufficiency of T and Lemma 12 imply that dyo/d{yQ -f X) («) 

The relation 

dyo _ dyo diyn + X) _ dyo / dk 
d\ d(yo + X) dX d{ya -h X) \d{yo -h X)/ 

_ Mo /. _ dyo \~^ 
d{yo + X) \ diyo + X)/ 

implies that dyo/d\ («) T~^{T); an application of Theorem 1 concludes the proof. 

A comparison of Theorems 1 and 2 and Corollary 3 yields immediately the 
following consequence of Theorem 3. 

Corollary 4. A necessary and sufficient condition that the statistic T be 
sufficient for a dominated set 30? of measures on S is that, for any two measures 
y and V in SO?, dy/diy + v) (e) T^^(T), or, equivalently, dy j dv («) T~^(T). 

10. The value of sufficient statistics in statistical methodology. We gather 
from conversations with some able and prominent mathematical statisticians 
that there is doubt and disagreement about just what a sufficient statistic is 
sufficient to do, and in particular about in what sense if any it contains “all the 
information in a sample.” We therefore conclude this paper with a brief 
explanation of a point of view which, while not original with us, has not received 
due publicity. 



240 


FAUIi R. HAIiMOS AND L. J. SAVAGE 


Suppose a statistician S is to be shown an observation x drawn at random 
from some sample space (X, S) on which an unknown measure, it, of a set 9)1 of 
possible measures obtains, while for the same observation x another statistician 
3" is only to be shown the value T(x) of some statistic T sufficient for 9)1. It is 
clear that § is as well off as 3*; we shall argue that 3* is also as well off as §. 

Suppose S has decided how to use his datum, that, in other words, he has 
decided just what he will do (or, in particular, say) in the event of each possible x. 
His program can then be described schematically by saying that he has selected 
some function / (of the points x) which, without serious loss of generality, may 
be supposed to take real values. Now S’s only real concern is for the probability 
distribution of / given n, i.e. for the function of a real variable c, defined by 

^(c) = it({x:f{x) < c)) = it(E(c)). 

But 3" can if he wishes achieve exactly the same results as S, in the following way. 
Let him, on learning the value of T(x), select a real number /, with the aid of a 
“random machine” which produces numerical values according (o the laio\vn 
distribution function t//, defined by 

He) = piEic) I r(x)). 

Then, for any n in 9)?, the probability that 3” will select a value less than c is 

/ pm) I y)ditT-^(y) = itiEic)) = He). 

Thus T is at no disadvantage, save for the mechanical one of having to manipu¬ 
late a random machine, and he may fairly be said to have as much information 
as S. 

As a matter of fact we know of no practical situation in which T would actually 
go to the trouble of using a random machine. There are some situations in 
which he should in principle do so, but in which practical statisticians have not, 
so far as we know, thought it worth while. If, for example, an outcome consists 
of a sequence of n heads and tails resulting from n spins of a coin the heads 
ratio of which is known to be either one half or one quarter, then a sufficient 
statistic is the number of heads which occur in the sequence. In basing a 
decision on the outcome of this program both S and, to a still greater extent, 
ST have (according to Wald’s theory of minimum risk) something to gain by 
recourse to a random machine. There are, on the other hand, many technical 
desiderata which sufficient statistics meet exactly without recourse to random 
machines. Thus, as Blackwell has shown,if S has an unbiased estimate, jB, 
of some parameter, 7 can find a function iZ*, defined by R*(y) == e{R | y), 
which is an unbiased estimate of that parameter, with variance not greater than 
that of R. More generally, if /2 is any estimate with finite mean square deviation 
from a parameter, then it is easy to show with Blackwell’s methods that R* 

D. Blackwell, ‘^Conditional expectation and unbiased sequential estimation/’ AnnaU 
of Math, Stat., Vol. 18 (1947), pp. 105-110. 



SUFFICIENT STATISTICS 


241 


has no larger a mean square deviation than R, Finally it is a well known fact 
that, under suitable hypotheses, if there exists a maximum likelihood estimate R 
of some parameter, then R depends only on y. 

We think that confusion has from time to time been thrown on the subject 
by (a) the unfortimate use of the term '^sufficient estimate,’^ (b) the undue 
emphasis on the factorability of sufficient statistics, and (c) the assumption 
that a sufficient statistic contains all the information in only the technical 
sense of "information*’ as measured by variance. 



ON DESIGNING SINGLE SAMPLING INSPECTION PLANS 

By Frank E. Grubbs 

BaUistic Research Lahorcdories, Aberdeen Proving Ground, Md. 

1. Summary. In designing single sampling inspection plans, a problem is to 
find the acceptance number, c, and the smallest sample size, n, such that if the 
fraction defective of the material inspected is equal to an acceptable value, pi, a 
large percentage, say, 95% of such lots will be accepted under the sample criteria, 
whereas if the fraction defective of the material inspected is objectionable and 
equal to p* (where pi < ps), then a large percentage, say, 90% of such lots will 
be rejected. A solution to this problem for the case where the lot size is large 
compared to the sample size is given in this paper and tables are provided for 
quick determination of the sample size n and acceptance number c. 

2. Introduction. In sampling inspection of material one practice is to set an 
acceptable quality level = pi, say, such that the consumer desires to accept 
practically all—95% or more—of lots of fraction defective pi or less (and hence 
desires to reject at most a maximum of about 5% of lots which are of quality 
Pi or better) and to set also an objectionable fraction defective =p!, say, which 
represents quality so poor that the consumer cannot afford to accept more than 
about 10% or less of lots of this quality or poorer.’ From the standpoint of the 
producer, he ^ould have very few rejections, 5% or less, for his submitted lots 
the fractions defective of which are equal to or better (less) than pi, whereas 
he should be willing and also expect to suffer increasingly more rejections if his 
process average percent defective departs from the acceptable quality level pi 
toward poor or objectionable quality. In this connection, if we are given pi an 
acceptable quality level, pj an objectionable percent defective, the risk a — 5% 
of rejecting a lot of fraction defective pi, and the risk /3 = 10% of accepting a 
lot of the objectionable fraction defective pt , a problem of importance in single 
sampling inspection is to find the smallest sample size n and the acceptance 
number c which will approximate closely the protection stated above. Due to 
the discrete nature of n and c, it is not usually possible to find n and c such that 
precisely the above protection is guaranteed; however, it is possible to pick that 
single sampling plan which, for all practical purposes, gives the desired protection, 
i.e. it is possible to select that single sampling plan which more nearly satisfies 

* When this paper was first presented for publication, the percent defectives pt and pt 
were labeled "Acceptable Quality Level” and “Lot Tolerance Percent Defective,” respec¬ 
tively. In view of the suggestions of H. G. Romig and H. F. Dodge, strict reference to 
these particular terms have been avoided in order that the percent defectives pi and pt 
would appear in a more generalized form. This recommendation is considered especially 
desirable in view of the fact that Table I and Table II of the paper are percentage points 
of the Binomial Distribution and hence are useful in problems other than that of designing 
single sampling inspection plans. 


242 



DESIGNING SAMPLING FLANS 


243 


the above protection requirements than any other plan. The values of n and c 
can be found simply by looking for an entry in Table I below which is close to 
Pi and an entry in Table II close to p* such that coliunn heading c and row 
heading n in Table I correspond exactly with the respective column and row 
headings in Table II. For the sample sizes n, acceptance numbers c and quality 
levels p covered in Tables I and II, the above procedure makes unnecessary any 
computation of or any approximation to the sample size and acceptance number. 
It will be noticed, however, that usually the proper choice of c is clear whereas 
some slight judgment may be necessary in selecting n. 

It is remarked also that Tables I and II solve the equivalent problem of 
finding n and c in connection with testing the hypothesis Ho that the fraction 
defective of the Binomial population sampled is pi or less as against an alternative 
hypothesis Hi which states that the fraction defective of the lot, population, 
process, etc., sampled is p 2 or greater (p 2 > pi), where a — .05 is the maximum 
risk of erroneously rejecting Ho when it is true and ^ = .10 is the maximum risk 
of erroneously accepting Ho when the alternative Hi is true. 

The solution to the problem of finding an appropriate single sampling plan 
in this paper is given by solving the infinite case, i.e. by assuming the lot to be an 
infinite Binomial population. In practice lots are of finite size. However, 
it is well known that Binomial probabilities (infinite universe) give excellent 
practical approximations to Hypergeometric probabilities (finite lot) provided 
the sample size is only a small percentage of the lot size. Hence, the reader is 
warned in using the tables for sampling inspection problems that the lot size 
should be at least 10 or 15 times the sample size. 

3. Basis for construction of Table I and Table IT. It is well known that if 
P(c, n, p) represents the probability of obtaining c or less defectives in a random 
sample of size n from a Binomial Population of fraction defective p, then the 
relation between P(c, n, p) and the Incomplete Beta Function Ratio is given by 

(1) P(c, n, p) = /i_,(n - c, c + 1) = f - xY dx. 

p{n — c, c + 1) Jo 

Consequently, using a table of percentage points for the Incomplete Beta 
Function (1), values of pi can be found for Table I such that 

P(c, n, pi) = .95, 

and values of pj can be found for Table II presented at the end such that 

P(c, n, ps) = .10. 

Also, Table I and Table II can be computed by using percentage points of the 
P-distribution (2). Upon making the transformation 

2(n — c) 

2(n - c) + 2(c + 1)F 


X 



244 


FRANK E. GRUBBS 


in (1) above to the F-distribution, we obtain easily that 


(2) P(c, n, p) 


1 

/3(c + 1, w - c) 


/’ 

•'(n-c)p/(e+l)fl 


[2(c+ l)r‘[2(n 


c)r‘r. 


[2(n - c) + 2(c + dF, 

where g = 1 — p. 

With the aid of a table of percentage points of the F-distribution (2), we may 
determine for various combinations of n — c and c + 1 those values of p such that 


and 


P(c, 11 , pi) = .95 for Table I; 


F(c, n, P 2 ) = .10 for Table 11. 


In fact, if Pic, n, p) = a, then 


or 

(c + l)F,{2(c + 1), 2(n - c)} 

^ (n - c) + (c + l)F«j2(c + 1), 2(n - c)} ’ 

for which relation values of pi for a — .95 are given in Table I below and values of 
Pj for a = .10 are given in Table II. 

Although the 95% points are not given directly in (2), they are easily obtain¬ 
able from the relation 


F tuivi , ^^ 2 ) 1? t \ * 

tjoiKVi, v{) 

Interpolation was required for the great majority of the entries in Tables I 
and II. The values given were obtained by harmonic or linear interpolation 
using References [1] and [2] and are believed accurate to within one unit in the 
last place. 

It will be noticed that if the chosen acceptable quality level, pi, is greater 
than the appropriate tabulated value in Table I for the single sampling plan 
(n, c), then the operating characteristic curve will pass below the point (pi, .95). 
That is, the risk of rejection under the sampling plan for lots of fraction defective 
Pi will be somewhat more than 5%. On the other hand, if a selected acceptable 
quality level pi is less than the appropriate entry in Table I, the risk of rejection 
for a product of fraction defective pi will be less than 5%. Similar considerations 
apply also to the fractions defective, p 2 , in Table II. 

4. Single sampling plans based on the Poisson approximation to the binomial. 

Tables I and II are useful for determination of a single sampling plan when the 



DESIGNING SAMPLING PLANS 


245 


desired percent defectives are listed and n does not exceed 150. Table III 
is particularly useful in designing a single sampling plan when we are interested 
in fractions defective not greater than about .10. A somewhat similar procedure 
has already been suggested by Peach and Littauer [3]. If we designate by 
P(c, a) the sum of individual Poisson probabilities, 


P(c, a) = 


—a jn 

e a 


m-O mi 


then Table III gives values oi = npi of a for which 

P(c, at) = .95 

and values ch =■ npj of a for which 

P(c, a») = .10. 

Hence, to find the single sampling plan whose operating characteristic curve 
passes nearly through the points (pi, .95) and (pt , .10) one merely divides 
values of ai in Table III for various values of c by the acceptable quality level pi 
and divides values of Oi in Table III by the objectionable percent defective pj. 
Then the acceptance number c is picked for which oi/pi most nearly equals a*/p» 
and the approximate sample size n may be determined by rounding to an integer 
the average of the two approximately equal numbers Oi/pi and ch/vt . 


6. Example on the use of Tables I, II, HI. Given an acceptable percent 
defective or quality level of .01 and an objectionable quality level of .10, it is 
desired to find the single sampling plan which will accept 95% of product which is 
of quality pi = .01 and which will reject 90% (or accept only 10%) of product of 
quality pt = .10. Looking in Table I for entries pi which approximately equal 
.01 and in Table II for entries ps which approximately equal .10 such that the 
c and n of Tables I and II correspond, we see that c must be equal to 1 whereas n 
may take possibly any one of the values 35, 36, 37, 38. In this connection, we 
have to set up some criteria for the choice of n. Although any of several criteria 
may be used, a reasonable criterion appears to involve picking n such that the 
sum of the absolute departures of the Operating Characteristic Curve from the 
risks a = .05 at pi and 0 = .10 at p 2 is a minimuip. This may be determined by 
using appropriate tables of Binomial Probabilities or by computing at pi and pz 
the chance of obtaining c or less defectives in n for the various possible combina¬ 
tions of c and n. If the above criterion were applied to the present example, 
the combination c = 1 and n = 37 would be selected, i.e. the single sampling 
plan would be c = 1, n = 37. For this sampling plan, the probability of passing 
at Pi = .01 is .9471 and the probability of passing at p 2 = .10 is .1036. For 
the sake of expediency, another proposal would be merely to select somewhat of a 
“middle” value of n especially when the variation in sample size is slight. 

If we use Table III for the above example, we can select n and c with the aid 



246 


FRANK E. GRUBBS 


of the following simple tabulation: 


n 

c 

0 

1 

2 

3 

Ui/Pi. 

5.1 

23.0 

35.5 

38.9 

81.8 

53.2 

136.6 

66.8 

O2/P2 . 



Since the sample sizes “cross” at c = 1, we would select c = 1 and n =* 1/2 
(35.5 + 38.9) = 37.2 or = 37. 

A use of Table I of some practical importance is in determining at a glance 
those values of p for which the probability of obtaining c or less defectives in a 
sample of n is equal to .95. As a matter of fact, a series of tables similar to 
Table I and Table II for which P(c, n, p) = .99, .95, .90, .10, .05, .01 etc. would 
be of considerable practical use. 

Acknowledgment The author is indebted to Miss Helen J. Coon for carrying 
out the computations for the tables. 













TABLE I 

Valwa of p Pi such t?uzt P(c, n, pi) = .95 



1 

c 















0 

1 

2 

3 

4 

5 

6 

7 

8 

9 


1 

.0500 










1 

2 

.0253 

.224 









2 

3 

.0170 

.135 

.368 








3 

4 

.0127 

.0976 

.249 

.473 







4 

5 

.0102 

.0764 

.189 

.343 

.549 






5 

6 

.00851 

.0628 

.153 

.271 

.418 

.607 





6 

7 

.00730 

.0534 

.129 

.225 

.341 

.479 

.652 




7 

8 

.00639 

.0464 

.111 

.193 

.289 

.400 

.529 

.688 



8 

9 

.00568 

.0410 

.0978 

.169 

.251 

.345 

.450 

.571 

.717 


9 

10 

.00512 

.0368 

.0873 

.150 

.222 

.304 

.393 

.493 

.606 

.741 

10 

11 

.00465 

.0333 

.0788 

.135 

.200 

.271 

.350 

.436 

.530 

.636 

11 

12 

.00427 

.0305 

.0719 

.123 

.181 

.245 

.315 

.391 

.473 

.562 

12 

13 

.00394 

.0281 

.0660 

.113 

.166 

.224 

.287 

.355 

.427 

.505 

13 

14 

.00366 

.0260 

.0611 

.104 

.153 

.206 

.264 

.325 

.390 

.460 

14 

15 

.00341 

.0242 

.0568 

.0967 

.142 

.191 

.244 

.300 

.360 

.423 

15 

16 

.00320 

.0227 

.0531 

.0903 

.132 

.178 

.227 

.279 

.333 

.391 

16 

17 

.00301 

.0213 

.0499 

.0846 

.124 

.166 

.212 

.260 

.311 

.364 

17 

18 

.00285 

.0201 

.0470 

.0797 

.116 

.156 

.199 

.244 

.291 

.341 

18 

19 

.00270 

.0190 

.0445 

.0753 

.110 

.147 

.188 

.230 

.274 

.320 

19 

20 

.00256 

.0181 

.0422 

.0714 

.104 

.140 

.177 

.217 

.259 

.302 

20 

21 

.00244 

.0172 

.0401 

.0678 

.0988 

.132 

.168 

.206 

.245 

.286 

21 

22 

.00233 

.0164 

.0382 

.0646 

.0941 

.126 

.160 

.196 

.233 

.271 

22 

23 

.00223 

.0157 

.0365 

.0617 

.0898 

.120 

.152 

.186 

.222 

.258 

23 

24 

.00213 

.0150 

.0350 

.0590 

.0859 

.115 

.146 

.178 

.212 

.246 

24 

25 

.00205' 

.0144 

.0335 

.0566 

.0823 

.110 

.139 

.170 

.202 

.236 

25 

26 

.00197 

.0138 

.0322 

.0543 

.0790 

.106 

.134 

.163 

.194 

.226 

26 

27 

.00190 

.0133 

.0310 

.0522 

.0759 

.101 

.129 

.157 

.186 

.217 

27 

28 

.00183 

.0128 

.0298 

.0503 

.0731 

.0977 

.124 

.151 

.179 

.208 

28 

29 

.00177 

.0124 

.0288 

.0485 

.0705 

.0942 

.119 

.145 

.172 

.200 

29 

30 

.00171 

.0120 

.0278 

.0469 

.0681 

.0909 

.115 

.140 

.167 

.193 

30 

31 

.00165 

.0116 

.0269 

.0453 

.0658 

.0878 

.111 

.135 

.161 

.187 

31 

32 

.00160 

.0112 

.0260 

.0438 

.0637 

.0850 

.107 

.131 

.155 

.180 

32 

33 

.00155 

.0109 

.0252 

.0425 

.0617 

.0823 

.104 

.127 

.150 

.175 

33 

34 

.00151 

.0106 

.0245 

.0412 

.0598 

.0798 

.101 

.123 

.146 

.169 

34 

35 

.00146 

.0102 

.0238 

.0400 

.0580 

.0774 

.0978 

.119 

.141 

.164 

35 


247 



TABLE I —Continued 


n 

c 

n 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

3G 

.00142 

.00996 

.0231 

.0389 

.0564 

.0752 

.0950 

.116 

.137 

.159 

36 

37 

.00139 

.00969 

.0225 

0378 

.0548 

.0731 

.0923 

.112 

.133 

.155 

37 

38 

.00135 

.00943 

.0219 

.0368 

.0533 

.0711 

.0898 

.109 

.130 

.150 

38 

39 

.00131 

.00919 

.0213 

.0358 

.0519 

.0692 

.0874 

.106 

.126 

.146 

39 

40 

.00128 

.00896 

.0208 

.0349 

.0506 

.0674 

.0851 

.104 

.123 

.142 

40 

41 

.00125 

.00874 

.0202 

.0.340 

.0493 

.0657 

.0830 

.101 

.120 

.139 

41 

42 

.00122 

.00853 

.0198 

.0332 

.0481 

.0641 

.0809 

.0985 

.117 

.135 

42 

43 

.00119 

.00833 

.0193 

.0324 

.0470 

.0626 

.0790 

.0961 

.114 

.132 

43 

44 

.00117 

.00814 

.0188 

.0317 

.0459 

.0611 

.0771 

.0938 

.111 

.129 

44 

45 

.00114 

.00795 

.0184 

.0309 

i 

.0448 

.0597 

.0754 

.0917 

.109 

.126 

45 

46 

.00111 

.00778 

.0180 

.0302 

.0438 

.0584 

.0737 

.0896 

.106 

.123 

46 

47 

.00109 

.00761 

.0176 

.0296 

.0429 

.0571 

.0720 

.0876 

.104 

.120 

47 

48 

.00107 

.00745 

.0172 

.0290 

.0420 

.0559 

.0705 

.0857 

.101 

.118 

48 

49 

.00105 

.00730 

.0169 

.0284 

.0411 

.0547 

.0690 

.0839 

.0993 

.115 

49 

50 

.00103 

.00715 

.0166 

.0278 

.0402 

.0536 

.0676 

.0822 

.0972 

.113 

50 

51 

.00101 

.00701 

.0162 

.0272 

.0394 

.0525 

.0662 

.0805 

.0953 

.110 

51 

52 

.000986 

.00688 

.0159 

.0267 

.0387 

.0515 

.0649 

.0789 

.0934 

.108 

52 

53 

.000967 

.00675 

.0156 

.0262 

.0379 

.0505 

.0637 

.0774 

.0916 

.106 

53 

54 

.000949 

.00662 

.0153 

.0257 

.0372 

.0495 

.0625 

.0759 

.0898 

.104 

54 

55 

.000932 

.00650 

.0150 

.0252 

.0365 

.0486 

.0613 

.0745 

.0881 

.102 

55 

56 

1 

.000916 

.00638 

.0148 

.0248 

.0358 

.0477 

.0602 

.0731 

.0865 

.100 

56 

57 

.000899 

.00627 

.0145 

.0243 

.0352 

.0468 

.0591 

.0718 

.0849 

.0984 

57 

58 

.000884 

.00616 

.0142 

.0239 

.0346 

.0460 

.0580 

.0705 

.0834 

.0966 

58 

59 

.000869 

.00606 

.0140 

.0235 

.0340 

.0452 

.0570 

.0693 

.0820 

.0949 

59 

60 

.000855 

.00595 

.0138 

.0231 

.0334 

.0445 

.0561 

.0681 

.0806 

.0933 

60 

61 

.000841! 

.00586 

.0135 

.0227 

.0329 

.0437 

.0551 

.0670 

.0792 

.0917 

61 

62 

.000827 

.00576 

.0133 

.0223 

.0323 

.0430 

.0542 

.0659 

.0779 

.0902 

62 

63 

.000814 

.00567 

.0131 

.0220 

.0318 

.0423 

.0533 

.0648 

.0766 

.0887 

63 

64 

.000801 

.00558 

.0129 

.0216 

.0313 

.0416 

.0525 

.0637 

.0754 

.0873 

64 

65 

.000789 

.00.549 

.0127 

.0213 

.0308 

.0410 

.0516 

.0627 

.0742 

.0859 

65 

66 

.000777 

.00541 

.0125 

.0210 

.0303 

.0403 

.0508 

.0618 

.0730 

.0846 

66 

67 

.000765 

.00533 

.0123 

.0206 

.0299 

.0397 

.0501 

.0608 

.0719 

.0833 

67 

68 

.000754 

.00525 

.0121 

.0203 

.0294 

.0391 

.0493 

.0599 

.0708 

.0820 

68 

69 

.000743 

.00517 

.0120 

.0200 

.0290 

.0385 

.0486 

.0590 

.0698 

.0808 

69 

70 

.000733 

.00510 

.0118 

.0198 

.0286 

.0380 

.0479 

.0582 

.0687 

.0796 

70 


248 




TABLE I —Continued 







c 





1 

n 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

71 

.000722 

.00503 

.0116 

.0195 

.0282 

.0374 

.0472 

.0573 

.0678 

.0785 

71 

72 

.000712 

.00496 

.0115 

.0192 

.0278 

.0369 

.0465 

.0565 

.0668 

.0773 

72 

73 

.000702 

.00489 

.0113 

.0189 

.0274 

.0364 

.0459 

.0557 

.0658 

.0762 

73 

74 

.000693 

.00482 

.0111 

.0187 

.0270 

.0359 

.0452 

.0549 

.0649 

.0752 

74 

75 

.000684 

.00476 

.0110 

.0184 

.0266 

.0354 

.0446 

.0542 

.0641 

.0742 

75 

76 

.000675 

.00470 

.0108 

.0182 

.0263 

.0349 

.0440 

.0535 

.0632 

.0732 

76 

77 

.000666 

.00463 

.0107 

.0179 

.0259 

.0345 

.0434 

.0528 

.0623 

.0722 

77 

78 

.000657 

.00457 

.0106 

.0177 

.0256 

.0340 

.0429 

.0521 

.0615 

.0712 

78 

79 

.000649 

.00452 

.0104 

.0175 

.0253 

.0336 

.0423 

.0514 

.0607 

.0703 

79 

80 

.000641 

.00446 

.0103 

.0173 

.0249 

.0332 

.0418 

.0507 

.0600 

.0694 

80 

81 

.000633 

.00440 

.0102 

.0170 

.0246 

.0328 

.0413 

.0501 

.0592 

.0685 

81 

82 

.000625 

.00435 

.0100 

.0168 

.0243 

.0323 

.0408 

.0495 

.0585 

.0677 

82 

83 

.000618 

.00430 

.00992 

.0166 

.0240 

.0319 

.0403 

.0489 

.0577 

.0668 

83 

84 

.000610 

.00425 

.00980 

.0164 

.0237 

.0316 

.0398 

.0483 

.0570 

.0660 

84 

85 

.000603 

.00420 

.00969 

.0162 

.0235 

.0312 

.0393 

.0477 

.0.564 

.06.52 

85 ' 

86 

.000596 

.00415 

.00957 

.0160 

.0232 

.0308 

.0388 

.0471 

.0557 

.0645 

86 

87 

.000589 

.00410 

.00946 

.0159 

.0229 

.0305 

.0384 

.0466 

.0550 

.0637 

87 

88 

.000583 

.00405 

.00936 

.0157 

.0227 

.0301 

.0379 

.0460 

.0544 

.0630 

88 

89 

.000576 

.00401 

.00925 

.0155 

.0224 

.0298 

.0375 

.0455 

.0538 

.0622 

89 

90 

.000570 

.00396 

.00915 

.01.53 

.0221 

.0294 

.0371 

.0450 

.0532 

.0615 

90 

91 

.000564 

.00392 

.00904 

.0152 

.0219 

.0291 

.0367 

.0445 

.0526 

.0608 

91 

92 

.000557 

.00388 

.00895 

.0150 

.0217 

.0288 

.0363 

.0440 

.0520 

.0602 

92' 

93 

.000551 

.00383 

.00885 

.0148 

.0214 

.0285 

.0359 

.0435 

.0514 

.0595 

93 ; 

94 

.000546 

.00379 

.00875 

.0147 

.0212 

.0282 

.0355 

.0431 

.0509 

.0589 

94 

95 

.000540 

.00375 

.00866 

.0145 

.0210 

.0279 

.0351 

.0426 

.0503 

.0582 

95 

96 

.000534 

.00371 

.00857 

.0144 

.0207 

.0276 

.0347 

.0421 

.0498 

.0576 

96 

97 

.000529 

.00368 

.00848 

.0142 

.0205 

.0273 

.0344 

.0417 

.0493 

.0570 

97 

98 

.000523 

.00364 

.00840 

.0141 

.0203 

.0270 

.0340 

.0413 

.0487 

.0564 

98 

99 

.000518 

.00360 

.00831 

.0139 

.0201 

.0267 

.0337 

.0408 

.0482 

.0558 

99 

100 

.000513 

.00357 

.00823 

.0138 

.0199 

.0265 

.0333 

.0404 

.0478 

.0553 

100 

101 

.000508 

.00353 

.00814 

.0136 

.0197 

.0262 

.0330 

.0400 

.0473 

.0547 

101 

102 

.000503 

.00350 

.00806 

.0135 

.0195 

.0259 

.0327 

.0396 

.0468 

.0542 

102 

103 

.000498 

.00346 

.00799 

.0134 

.0193 

.0257 

.0323 

.0392 

.0463 

.0536 

103 

104 

.000493 

.00343 

.00791 

.0132 

.0191 

.0254 

.0320 

.0389 

.0459 

' .0531 

104 

105 

.000488 

.00339 

.00783 

.0131 

.0189 

.0252 

.0317 

.0385 

.0454 

.0526 

105 


249 



TABLE 1—Continued 



C 

n 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

106 

.000484 

.00336 

.00776 

.0130 

.0188 

.0249 

.0314 

.0381 

.0450 

.0521 

106 

107 

.000479 

.00333 

.00768 

.0129 

.0186 

.0247 

.0311 

.0378 

.0446 

.0516 

107 

108 

.000475 

.00330 

.00761 

.0127 

.0184 

.0245 

.0308 

.0374 

.0442 

.0511 

108 

109 

.000470 

.00327 

.00754 

.0126 

.0182 

.0242 

.0305 

.0370 

.0438 

.0506 

109 

110 

.000466 

.00324 

.00747 

.0125 

.0181 

.0240 

.0302 

.0367 

.0433 

.0502 

no 

111 

.000462 

.00321 

.00741 

.0124 

.0179 

.0238 

.0300 

.0364 

.0430 

.0497 

111 

112 

.000458 

.00318 

.00734 

.0123 

.0178 

.0236 

.0297 

.0360 

.0426 

.0492 

112 

113 

.000454 

.00315 

.00727 

.0122 

.0176 

.0234 

.0294 

.0357 

.0422 

.0488 

113 

114 

.000450 

.00313 

.00721 

.0121 

.0174 

.0232 

.0292 

.0354 

.0418 

.0484 

114 

115 

.000446 

.00310 

.00715 

.0120 

.0173 

.0230 

.0289 

.0351 

.0414 

.0479 

115 

116 

.000442 

.00307 

.00709 

.0119 

.0171 

.0228 

.0287 

.0348 

.0411 

.0475 

116 

117 

.000438 

.00305 

.00702 

.0118 

.0170 

.0226 

.0284 

.0345 

.0407 

.0471 

117 

118 

.000435 

.00302 

.00696 

.0117 

.0168 

.0224 

.0282 

.0342 

.0404 

.0467 

118 

119 

.000431 

.00299 

.00691 

.0116 

.0167 

.0222 

.0279 

.0339 

.0400 

.0463 

119 

120 

.000427 

.00297 

.00685 

.0115 

.0166 

.0220 

.0277 

.0336 

.0397 

.0459 

120 

121 

.000124 

.00294 

.00679 

.0114 

.0164 

.0218 

.0275 

.0333 

.0394 

.0455 

121 

122 

.000420 

.00292 

.00674 

.0113 

.0163 

.0216 

.0272 

.0330 

.0390 

.0451 

122 

123 

.000417 

.00290 

.00668 

.0112 

.0162 

.0215 

.0270 

.0328 

.0387 

.0448 

123 

124 

.000414 

.00287 

.00663 

.0111 

.0160 

.0213 

.0268 

.0325 

.0384 

.0444 

124 

125 i 
• 1 

.000410 

.00285 

.00657 

.0110 

.0159 

.0211 

.0266 

.0322 

.0381 

.0440 

125 

1 

126 

.000407 

.00283 

.00652 

.0109 

.0158 

.0209 

.0264 

.0320 

.0378 

.0437 

126 

127 

.000404 

.00281 

.00647 

.0108 

.0156 

.0208 

.0262 

.0317 

.0375 

.0433 

127 

128 

.000401 

.00278 

.00642 

.0107 

.0155 

.0206 

.0259 

.0315 

.0372 

.0430 

128 

129 

.000398 

.00276 

.00637 

.0107 

.0154 

.0204 

.0257 

.0312 

.0369 

.0427 

129 

130 

.000394 

.00274 

.00632 

.0106 

.0153 

.0203 

.0255 

.0310 

.0366 

.0423 

130 

131 

.000391 

.00272 

.00627 

.0105 

.0152 

.0201 

.0253 

.0308 

.0363 

.0420 

131 

132 

.000389 

.00270 

.00622 

.0104 

.0150 

.0200 

.0252 

.0305 

.0360 

.0417 

132 

133 

.000386 

.00268 

.00618 

.0103 

.0149 

.0198 

.0250 

.0303 

.0358 

.0414 

133 

134 

.000383 

.00266 

.00613 

.0103 

.0148 

.0197 

.0248 

.0301 

.0355 

.0410 

134 

135 

.000380 

.00264 

.00608 

.0102 

.0147 

.0195 

.0246 

.0298 

.0352 

.0407 

135 

136 

.000377 

.00262 

.00604 

.0101 

.0146 

.0194 

.0244 

.0296 

.0350 

.0404 

136 

137 

.000374 

.00260 

.00599 

.0100 

.0145 

.0192 

.0242 

.0294 

.0347 

.0401 

137 

138 

.000372 

.00258 

.00595 

.00996 

.0144 

.0191 

.0240 

.0292 

.0344 

.0398 

138 

139 

.000369 

.00256 

.00591 

.00989 

.0143 

.0190 

.0239 

.0290 

.0342 

.0395 

139 

140 

.000366 

.00254 

.00587 

.00982 

.0142 

.0188 

.0237 

.0288 

1 

.0339 

.0393 

140 


250 






DESIONINO SAMPLING PLANS 


251 


TABLE l—Cmdvded 


n 

c 

n 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

141 

.000364 

.00253 

.00582 

.00975 

.0141 

.0187 

.0235 

.0285 

.0337 

.0390 

141 

142 

.000361 

.00251 

.00578 

.00968 

.0140 

.0186 

.0234 

.0283 

.0335 

.0387 

142 

143 

.000359 

.00249 

.00574 

.00961 

.0139 

.0184 

.0232 

.0281 

.0332 

.0384 

143 

144 

.000356 

.00247 

.00570 

.00954 

.0138 

.0183 

.0230 

.0279 

.0330 

.0382 

144 

145 

.000354 

.00246 

.00566 

.00948 

.0137 

.0182 

.0229 

.0278 

.0328 

.0379 

145 

146 

.000351 

.00244 

.00562 

.00941 

.0136 

.0180 

.0227 

.0276 

.0325 

.0376 

146 

147 

.000349 

.00242 

.00559 

.00935 

.0135 

.0179 

.0226 

.0274 

.0323 

.0374 

147 

148 

.000346 

.00241 

.00555 

.00928 

.0134 

.0178 

.0224 

.0272 

1.0321 

.0371 

148 

149 

.000344 

.00239 

.00551 

.00922 

.0133 

.0177 

.0223 

.0270 

i.0319 

.0369 

149 

150 

.000342 

.00237 

.00547 

.00916 

.0132 

.0176 

.0221 

.0268 .0317 

.0366 

i 150 

I 



TABLE II 

Values of p Pi such that P{c, n, pi) = .10 









0 

1 

2 

3 

4 

1 

.900 





2 

.684 

.949 




3 

.536 

.804 

.965 



4 

.438 

.680 

.857 

.974 


5 

.369 

.584 

.753 

.888 

.979 

6 

.319 

.510 

.007 

.799 

.907 

7 

.280 

.453 

.596 

.721 

.830 

8 

.250 

.406 

.538 

.655 

.760 

9 

.226 

.368 

.490 

.599 

.699 

10 

.206 

.337 

.450 

.552 

.646 

11 

.189 

.310 

.415 

.511 

.599 

12 

.175 

.288 

.386 

.475 

.559 

13 

.162 

.268 

.360 

.444 

.523 

14 

.152 

.251 

.337 

.417 

.492 

15 

.142 

.236 

.317 

.393 

.464 

16 

.134 

.222 

.300 

.371 

.439 

17 

.127 

.210 

.284 

.352 

.410 

18 

.120 

.199 

.209 

.334 

.396 

19 

.114 

.190 

.257 

.319 

.378 

20 

.109 

.181 

.245 

.304 

.301 

21 

j .104 

.173 

.234 

.291 

.345 

22 

j .0994 

.166 

.224 

.279 

.331 

23 

.0953 

.159 

.215 

.268 

.318 

241 .0915 

.153 

.207 

.258 

.300 

25 

.0880 

.147 

.199 

.248 

.295 

1 

26! .0847 

.142 

.192 

.239 

.284 

27 

.0817 

.137 

.185 

.231 

.275 

28 

.0789 

.132 

.179 

.223 

.265 

29 

.0763 

.128 

.173 

.210 

.257 

30 

.0739 

.124 

.168 

.209 

.249 

31 

.0716 

.120 

.103 

.203 

.241 

32 

.0694 

.110 

.158 

.197 

.234 

33 

.0674 

.113 

.153 

.191 

.228 

34 

.0655 

.110 

.149 

.186 

.221 

35 

.0037 

.107 

.145 

.181 

.216 


n 


5 

6 

7 

8 

9 







1 






2 






3 






4 






5 

.983 





6 

.921 

.985 




7 

.853 

.931 

.987 



8 

.790 

.871 

.939 

.988 


9 

.733 

.812 

.884 

.945 

.990 

10 

.682 

.759 

.831 

.895 

.951 

11 

.638 

.712 

.781 

.840 

.904 

12 

.598 

.069 

.736 

.799 

.858 

13 

.563 

.631 

.695 

.757 

.815 

14 

.532 

.596 

.658 

.718 

.774 

15 

.504 

. 565 

.625 

.082 

.737 

16 

.478 

.537 

.594 

.650 

.703 

17 

.455 

.512 

.567 

.620 

.071 

18 

.434 

.489 

.541 

.592 

.642 

19 

.415 

.467 

.518 

.567 

.615 

20 

.397 

.448 

.497 

.544 

.590 

21 

.381 

.430 

.477 

.523 

.508 

22 

.366 

.413 

.459 

.503 

.540 

23 

.352 

.398 

.442 

.485 

.526 

24 

.340 

.383 

.426 

.467 

.508 

25 

.328 

.370 

.411 

.451 

.491 

26 

.317 

.358 

.397 

.436 

.475 

27 

.306 

.346 

.385 

.422 

.459 

28 

.297 

.335 

.372 

.409 

.445 

29 

.288 

.325 

.361 

.397 

.432 

30 

.279 

.315 

.350 

.385 

.419 

31 

.271 

.306 

.340 

.374 

.407 

32 

.263 

.297 

.331 

.364 

.396 

33 

.256 

.289 

.322 

.354 

.385 

34 

.249 

.282 

.313 

.345 

.375 

35 


252 



TABLE II —Continued 


n 1 





c 







0 

1 

2 

3 

4 

5 

6 

7 

8 

9 


36 

.0620 

.104 

.141 

.176 

.210 

.242 

.274 

.305 

.336 

.366 

36 

37 

.0603 

.101 

.138 

.172 

.205 

.236 

.267 

.298 

.327 

.357 

37 

38 

.0588 

.0985 

.134 

.167 

.199 

.230 

.261 

.290 

.319 

..348 

38 

39 

.0573 

.0961 

.131 

.163 

.195 

.225 

.254 

.283 

.312 

.340 

39 

40 

.0559 

.0938 

.128 

.1.59 

.190 

.220 

.248 

.277 

.305 

..332 

40 

41 

.0546 

.0916 

.125 

. 156 

.186 

.215 

.242 

.270 

.298 

..324 

41 

42 

.0533 

.0895 

.122 

.152 

.181 

.210 

.237 

.264 

.291 

.317 

42 

43 

.0521 

.0875 

.119 

.119 

.177 

.205 

.232 

.259 

.285 

.310 

43 

44 

.0510 

.0856 

.116 

.146 

.174 

.201 

.227 

.253 

.279 

.304 

44 

45 

.0499 

.0837 

.114 

.142 

.170 

.196 

.222 

.248 

.273 

.297 

45 

46 

.0488 

.0819 

.112 

.140 

J6G 

.192 

.218 

.243 

.268 

.291 

46 

47 

.0478 

.0803 

.109 

.137 

.163 

.188 

.213 

.238 

.262 

.285 

47 

48 

.0468 

.0786 

.107 

.134 

.160 

.185 

.209 

.233 

.257 

.280 

48 

49 

.0459 

.0771 

.105 

.131 

.157 

.181 

.205 

.229 

.252 

.274 

49 

50 

.0450 

.0756 

.103 

.130 

.154 

.178 

.201 

.224 

.248 

.269 

50 

51 

.0441 

.0741 

.101 

.126 

.151 

.174 

.197 

.220 

.243 

.264 

51 

52 

.0433 

.0728 

.0991 

.124 

.148 

.171 

.194 

.216 

.239 

.259 

52 

53 

.0425 

.0714 

.0973 

.122 

.145 

.168 

.190 

.212 

.235 

.255 

53 

54 

.0417 

.0701 

.0956 

.120 

.143 

.165 

.187 

.208 

.230 

.250 

54 

55 

.0410 

.0689 

.0939 

.117 

.140 

.162 

.184 

.205 

.227 

.246 

55 

56 

.0403 

.0677 

.0923 

.115 

.138 

.159 

.180 

.201 

.223 

.242 

56 

57 

.0396 

.0665 

.0907 

.113 

.135 

.157 

.177 

.198 

.219 

.238 

57 

58 

.0389 

.0654 

.0892 

.112 

.133 

.154 

.175 

.195 

.216 

.234 

58 

59 

.0383 

.0643 

.0877 

.110 

.131 

.152 

.172 

.191 

.212 

.230 

59 

60 

.0376 

.0633 

.0863 

.108 

.129 

.149 

.169 

.188 

.209 

.226 

60 

61 

.0370 

.0623 

.0849 

.106 

.127 

.147 

.166 

.185 

.206 

.223 

61 

62 

.0365 

.0613 

.0836 

.105 

.125 

.145 

.164 

.183 

.203 

.219 

62 

63 

.0359 

.0603 

.0823 

.103 

.123 

.142 

.161 

.180 

.200 

.216 

63 

64 

.0353 

.0594 

.0810 

.101 

.121 

.140 

.159 

.177 

.197 

.213 

64 

65 

.0348 

.0585 

.0798 

.0999 

.119 

.138 

.156 

.174 

.194 

.210 

65 

66 

.0343 

.0577 

.0786 

.0984 

.117 

.136 

.154 

.172 

.191 

.207 

66 

67 

.0338 

.0568 

.0775 

.0970 

.116 

.134 

.152 

.169 

.188 

.204 

67 

68 

.0333 

.0560 

.0764 

.0956 

.114 

.132 

.150 

.167 

.185 

.201 

68 

69 

.0328 

.0552 

.0753 

.0943 

.113 

.130 

.148 

.165 

.182 

.198 

69 

70 

.0324 

.0544 

.0743 

.0930 

.111 

.128 

.146 

.162 

.179 

.195 

7(1 


253 




TABLE II —Continued 


n 

c 

1% 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 


71 

.0319 

.0537 

.0732 

.0917 

.109 

.127 

.144 

.160 

.177 

.193 

71 

72 

.0315 

.0530 

.0722 

.0904 

.108 

.125 

.142 

.158 

.174 

.190 

72 

73 

.0310 

.0522 

.0713 

.0892 

.107 

.123 

.140 

.156 

.172 

.188 

73 

74 

.0306 

.0516 

.0703 

.0881 

.105 

.122 

.138 

.154 

.170 

.185 

74 

75 

.0302 

.0509 

.0694 

.0869 

.104 

.120 

.136 

.152 

.167 

.183 

75 

76 

.0298 

.0502 

.0685 

.0858 

.102 

.119 

.134 

.150 

.165 

.180 

76 

77 

.0295 

.0496 

.0676 

.0847 

.101 

.117 

.133 

.148 

.163 

.178 

77 

78 

.0291 

.0490 

.0668 

.0836 

.0999 

.116 

.131 

.146 

.161 

.176 

78 

79 

.0287 

.0483 

.0660 

.0826 

.0987 

.114 

.130 

.145 

.159 

.174 

79 

80 

.0284 

.0478 

.0652 

.0816 

.0974 

.113 

.128 

.143 

.157 

.172 

80 

81 

.0280 

.0472 

.0644 

.0806 

.0963 

.111 

.126 

.141 

.155 

.170 

81 

82 

.0277 

.0466 

.0636 

.0797 

.0951 

.110 

.125 

.139 

.154 

.168 

82 

83 

.0274 

.0461 

.0629 

.0787 

.0940 

.109 

.123 

.138 

.152 

.166 

83 

84 

.0270 

.0455 

.0621 

.0778 

.0929 

.108 

.122 

.136 

.150 

.164 

84 

85 

.0267 

.0450 

.0614 

.0769 

.0918 

.106 

.121 

.135 

.148 

.162 

85 

86 

.0264 

.0445 

.0607 

.0760 

.0908 

.105 

.119 

.133 

.147 

.160 

86 

87 

.0261 

.0440 

.0600 

.0752 

.0898 

.104 

.118 

.132 

.145 

.158 

87 

88 

.0258 

.0435 

.0594 

.0743 

.0888 

.103 

.117 

.130 

.143 

.157 

88 

89 

.0255 

.0430 

.0587 

.0735 

.0878 

.102 

.115 

.129 

.142 

.155 

89 

90 

.0253 

.0425 

.0581 

.0727 

.0869 

.101 

.114 

.127 

.140 

.153 

90 

91 

.0250 

.0421 

.0574 

.0719 

.0859 

.0995 

.113 

.126 

.139 

.152 

91 

92 

.0247 

.0416 

.0568 

.0712 

.0850 

.0985 

.112 

.125 

.137 

.150 

92 

93 

.0245 

.0412 

.0562 

.0704 

.0841 

.0974 

.110 

.123 

.136 

.148 

93 

94 

.0242 

.0408 

.0556 

.0697 

.0&32 

.0964 

.109 

.122 

.135 

.147 

94 

95 

.0239 

.0403 

.0551 

.0690 

.0824 

.0954 

.108 

.121 

.133 

.145 

95 

96 

.0237 

.0399 

.0545 

.0683 

.0815 

.0945 

.107 

.120 

.132 

.144 

96 

97 

.0235 

.0395 

.0539 

.0676 

.0807 

.0935 

.106 

.118 

.131 

.143 

97 

98 

.0232 

.0391 

.0534 

.0669 

.0799 

.0926 

.105 

.117 

.129 

.141 

98 

99 

.0230 

.0387 

.0529 

.0662 

.0791 

.0917 

.104 

.116 

.128 

.140 

99 

100 

.0228 

.0383 

.0524 

.0656 

.0784 

.0908 

.103 

.115 

.127 

.138 

100 

101 

.0225 

.0380 

.0518 

.0650 

.0776 

.0899 

.102 

.114 

.125 

.137 

101 

102 

.0223 

.0376 

.0513 

.0643 

.0768 

.0890 

.101 

.113 

.124 

.136 

102 

103 

.0221 

.0372 

.0508 

.0637 

.0761 

.0882 

.100 

.112 

.123 

.134 

103 

104 

.0219 

.0369 

.0504 

.0631 

.0754 

.0874 

.0991 

.111 

.122 

.133 

104 

105 

.0217 

.0365 

.0499 

.0625 

.0747 

.0865 

.0981 

.110 

.121 

.132' 

105 


264 



TABLE II —Continved 


n 





c 








0 

■n 

2 

3 

g 

5 

6 

7 

8 

9 

1 

106 

.0215 


.0494 

.0619 


.0857 

.0972 

.109 

1 

.120 

.131 

106 


.0213 

.0359 

.0490 

.0614 

.0733 

.0850 

.0964 

.108 

.119 

.130 

107 

108 

.0211 

.0355 

.0485 

.0608 

-.0727 

.0842 

.0955 

.107 

.118 

.128 

108 

109 

.0209 

.0352 

.0481 

.0603 

.0720 

.0834 

.0946 

.106 

.116 

.127 

109 

110 

.0207 

.0349 

.0477 

.0597 

.0714 

.0827 

.0938 

.105 

.115 

.126 

no 

111 

.0205 

.0346 

.0472 

.0592 

.0707 

.0820 

.0930 

.104 

.114 

.125 

111 

112 

.0204 

.0343 

.0468 

.0587 

.0701 

.0812 

.0921 

.103 

.113 

.124 

112 

113 

.0202 

.0340 

.0464 

.0582 

.0695 

.0805 

.0913 

.102 

.112 

.123 

113 

114 

.0200 

.0337 

.0460 

.0577 

.0689 

.0798 

.0906 

.101 

.111 

.122 

114 

115 

.0198 

.0334 

.0456 

.0572 

.0683 

.0792 

.0898 

.100 

.111 

.121 

115 

116 

.0197 

.0331 

.0452 

.0567 

.0677 

.0785 

.0890 

.0994 

.110 

.120 

116 

117 

.0195 

.0328 

.0449 

.0562 

.0672 

.0778 

.0883 

.0986 

.109 

.119 

117 

118 

.0193 

.0326 

.0445 

.0557 

.0666 

.0772 

.0875 

.0977 

.108 

.118 

118 

119 

.0192 

.0323 

.0441 

.0553 

.0661 

.0765 

.0868 

.0969 

.107 

.117 

119 

120 

.0190 

.0320 

.0437 

.0548 

.0655 

.0759 

.0861 

.0961 

.106 

.116 

120 

121 

.0189 

.0318 

.0434 

.0544 

.0650 

.0753 

.0854 

.0954 

.105 

.115 

121 

122 

.0187 

.0315 

.0430 

.0539 

.0645 

.0747 

.0847 

.0946 

.104 

.114 

122 

123 

.0185 

.0313 

.0427 

.0535 

.0639 

.0741 

.0841 

.0938 

.104 

.113 

123 

124 

.0184 

.0310 

.0424 

.0531 

.0634 

.0735 

.0834 

.0931 

.103 

.112 

124 

125 

.0183 

.0308 

.0420 

.0527 

.0629 

.0729 

.0827 

.0924 

.102 

.111 

125 

126 

.0181 

.0305 

.0417 

.0523 

.0624 

.0724 

.0821 

.0917 

.101 

.110 

126 

127 

.0180 

.0303 

.0414 

.0519 

.0620 

.0718 

.0815 

.0909 

.100 

.110 

127 

128 

.0178 

.0301 

.0410 

.0515 

.0615 

.0713 

.0808 

.0902 

.0995 

.109 

128 

129 

.0177 

.0298 

.0407 

.0511 

.0610 

.0707 

.0802 

.0896 

.0988 

.108 

129 

130 

.0176 

.0296 

.0404 

.0507 

.0606 

.0702 

.0796 

.0889 

.0980 

.107 

130 

131 

.0174 

.0294 

.0401 

.0503 

.0601 

.0696 

.0790 

.0882 

.0973 

.106 

131 

132 

.0173 

.0291 

.0398 

.0499 

.0596 

.0691 

.0784 

.0876 

.0966 

.105 

132 

133 

.0172 

.0289 

.0395 

.0495 

.0592 

.0686 

.0778 

.0869 

.0959 

.105 

133 

134 

.0170 

.0287 

.0392 

.0492 

.0588 

.0681 

.0773 

.0863 

.0952 

.104 

134 

135 

.0169 

.0285 

.0389 

.0488 

.0583 

i 

.0676 

.0767 

.0857 

.0945 

.103 

1 

135 

136 

.0168 

.0283 

.0387 

.0485 

.0579 

.0671 

.0762 

.0850 

.0938 

.102 

136 

137 

.0167 

.0281 

.0384 

.0481 

.0575 

.0666 

.0756 

.0844 

.0931 

.102 

137 

138 

.0165 

.0279 

.0381 

.0478 

.0571 

.0662 

.0751 

.0838 

.0925 

.101 

138 

139 

.0164 

.0277 

.0378 

.0474 

.0567 

.0657 

.0745 

.0832 

. 0918 '. 100 

139 

140 

.0163 

.0275 

.0376 

.0471 

.0563 

.0652 

.0740 

.0826 

I .0912 

;.0996 

140 


255 





256 


FRANK E. GRUBBS 


TABLE ll—Condvded 


n 





c 







0 

1 

2 

3 

4 

5 

6 

7 

8 

9 


141 

.0162 

.0273 

.0373 

.0468 

.0559 

.0648 

.0735 

.0821 

.0905 

.0989 

141 

142 

.0161 

.0271 

.0370 

.0464 

.0555 

.0643 

.0730 

.0815 

.0899 

.0982 

142 

143 

.0160 

.0269 

.0368 

.0461 

.0551 

.0639 

.0725 

.0809 

.0893 

.0975 

143 

144 

.0159 

.0267 

.0365 

.0458 

.0547 

.0635 

.0720 

.0804 

.0887 

.0969 

144 

145 

.0158 

.0266 

.0363 

.0455 

1 

.0544 

.0630 

.0715 

.0798 

.0881 

.0962 

145 

146 

.0156 

.0264 

.0360 

1 

.0452 

.0540 

.0626 

.0710 

.0793 

.0875 

.0956 

146 

147 

.0155 

.0262 

.0358 

.0449 

.0536 

.0622 

.0705 

.0788 

.0869 

.0949 

147 

148 

.0154 

.0260 

.0356 

.0446 

.0533 

.0618 

.0701 

.0783 

.0863 

.0943 

148 

149 

.0153 

.0259 

.0353 

.0443 

.0529 

.0614 

.0696 

.0777 

.0858 

.0937 

149 

150 

.0152 

.0257 

.0351 

.0440 

.0526 

1 

.0610 

.0692 

.0772 

.0852 

.0931 

150 


TABLE III 

{Based on Poisson approximation to the binomial distribution) 


Acceptance Number 

Values of oi = npi for which 
P(c, a,) = .95 

1 

Values of ai =■ npa for which 
P(c, 0 ,) = .10 

0 

.05129 

2.303 

1 

.3554 

3.890 

2 

.8177 

5.322 

3 

1.366 

6.681 

4 

1.970 

7.994 

5 

2.613 

9.276 

6 

3.285 

10.53 

7 

3.981 

11.77 

8 

4.695 

12.99 

9 

5.425 

14.21 

10 

6.169 

15.41 

11 

6.924 

16.60 

12 

7.690 

17.78 

13 

8.464 

18.96 

14 

9.246 

20.13 

15 

10.04 

21.29 


REFERENCES ‘ 

[1] Catherine M. Thompson, “Tables of percentage points of the incomplete beta func¬ 

tion,*' Biometrikaj Vol. 32 (1944), pp. 151-181. 

[2] Maxine Merrington and Catherine M. Thompson, “Tables of percentage points 

of the inverted beta (F) distribution," Biometrikaj Vol. 33 (1945), pp. 77-88. 

[3] Paul Peach and S. B. Littaubr, “A note on sampling inspection," Annals of Math, 

Stat.j Vol. 17 (1946), pp. 81-84. 



ON THE RANGE-MIDRANGE TEST AND SOME TESTS WITH BOUNDED 

SIGNIFICANCE LEVELS^ 

By John E. Walsh 
The RAND Corporation 

1. Sununaiy. This paper is divided into two parts. The significance tests 
investigated in Part I concern the population mean and are based on the quantity 

[(sample midrange)-(hypothetical mean)]/(sample range). 

The case in which the observations are a sample from a normal population is 
considered in detail. The tests investigated are summarized in Table 1. These 
tests are found to be very efficient for small samples (see Table 4, power efficiency 
is defined in section 3). An investigation of several e.xtrcmely non-normal 
populations using the values of Da obtained for normality indicates that the 
significance level of the range-midrange test is not very sensitive to the require¬ 
ment of normality for small samples (see Table G). Also the tests of Table 1 
can be applied without computation through the use of an easily constructed 
graph (see section 4). These properties suggest that the range-midrange test is 
preferable to the Student i-test and the analogue of the Student <-test using the 
sample range (see [1] and [2]) whenever the sample size is sufficiently small. 

Use of the range-midrange test for the case of normality was proposed by E. S. 
Pearson in [3], where properties of the test were experimentally investigated 
for the normal and certain non-normal populations. 

In Part II several significance tests for the mean are developed which have a 
specified significance level for the case of a sample from a normal population 
but whose significance level is bounded near the specified value under very 
general conditions, one of which is that the observations are from continuous 
symmetrical populations. Some of these tests are range-midrange tests. Table 
2 contains a summary of the tests and their properties (xj = fth largest observa¬ 
tion, i = 1, • • • , n; conditions (D) are given in section 7). 

PART I. THE RANGE-MIDRANGE TEST 

2. Introduction. In 1929 E. S. Pearson proposed using the range-midrange 
test for the case of a sample from a normal population (see [3]) and experi¬ 
mentally investigated some of its properties for sample sizes of 5 and 10 and 
significance levels of 2% and 10% ( 83 Tnmetrical tests). Using the constants 
(corresponding to the D„ in this paper) determined for the case of normality, 

^ This paper was presented to a joint meeting of the Institute of Mathematical Statistics 
and the American Mathematical Society at New Haven, Conn, in September, 1947. The 
results presented in this paper were obtained in the course of research conducted under the 
sponsorship of the Office of Naval Research. This research was performed while the 
author was at Princeton University. 


267 



258 


JOHN E. WALSH 


significance level and power function properties of these four tests were experi¬ 
mentally investigated for several non-normal populations. The results of this 
empirical investigation indicated that the range-midrange test is very eflficient 
for normality and not very sensitive to the assumption of normality if the sample 
size is sufficiently small. 

This paper presents an analytical investigation of properties of the range- 
midrange test for n = 2, 3, • • • , 10 and a wide range of significance levels. 
The results of this investigation confirm the contention that the range-midrange 
test is very efficient for normality and small samples; also an analytical investiga¬ 
tion of how the significance level changes for the case of certain extremely 
non-normal populations furnishes results which agree with the contention 
that the range-midrange test is not very sensitive to the requirement of normality 
for sufficiently small samples. 

In most cases the results presented in this paper are not directly comparable 
with those obtained by Pearson. It was possible, however, to obtain values of 
Da, (a = 5%, 1%; n = 5, 10), from the results presented in [3]; these values 
were found to be in close agreement with the corresponding values of Table 5. 

3. EfiSciency of range-midrange. The purpose of this section is to use the 
relations derived in section 6 to determine the power efficiencies of tests A, B 
and C (see Table 1) for a = 1%, 5% and « = 2, • • • , 10. To do this the method 
of defining power efficiency given in [4] and [5] will be used. As shown in [5], 
it is sufficient to consider only test A ; for any fixed n and a, tests A, B and C all 
have the same power efficiency (note that the significance level of test C is 2a). 

For a normal population (unknown variance) the most powerful test of the 
one-sided alternative < w is the appropriate Student <-test. The procedure 
Used in determining the power efficiency of test A consists in first computing the 
power function of test A for the given values of n and a; then the sample size 
of the corresponding Student <-test at this significance level is varied until the 
power function of the <-test is approximately equal to that of test A. The size 
sample (not necessarily integral) thus obtained for the f-test divided by n is 
called the power efficiency of test A for the given values of n and a. Intuitively 
the power efficiency of a test measures the percentage of the total available 
information per observation which is being utilized by that test. 

Table 3 contains values of the power function for test A. These values were 
computed from equation (3) of section 6 by approximate integration. 

The corresponding values of the power function for the Student t-test were 
found by using the normal approximation given in [6]. This approximation 
was used for fractional degrees of freedom. The sample sizes considered as 
well as the resulting power function values are listed in Table 3. A comparison 
of the power function values for the two types of tests furnishes the approximate 
power efficiencies listed in Table 3. 

For n =s 2, test A is itself a Student <-test. The power efficiency is therefore 
100% for that sample size. This combined with Table 3 furnishes power 



BANQE-MIDRANGE TEST 


259 


efficiencies at the 1% level for n = 2,6,8,10 and at the 5% level for n = 2, 6, 10. 
The approximate power efficiencies given in Table 4 for other values of n were 
obtained from these values by graphical interpolation. 

Table 4 shows that the power efficiency for a = 1% is very good for n < 8, 
while for a = 5% the efficiency is good for n < 6. 

TABLE 1 


Summary of range-midrange tests 


Definitions 

Tests 

Signifi- 

Accept 

- 

If 

cRuce 

Level 

Test based on sample of size n, (2<n< 

10 ), from an arbitrary normal popula¬ 
tion. 

Xi = smallest sample value. 

Xn = greatest sample value. 

(A) 

M<Mo 

1 

V 

a 

M = the mean of the normal population, 
juo = given hypothetical mean value to be 
tested. 

jy _ (sample midrange)-(hypothetical mean) 

(B) 

p>po 


D>Da 

a 

(sample range) 

= [(a:« + Xi)/2 - /to]/(x„ - Xi). 

Da = constant depending on n and a. 

Values of a versus Da for 2<n<10 and 
a = 5%, 2.5%, 1%, 0.5% are given in 
Table 5. 





(C) 

\D\>Da 

2a 


4. Construction of graph. In most problems to which a test of the type 
developed in this paper would be applied, the values of the sample can be con¬ 
sidered to have practical lower and upper limits, say a and b. For example, in 
many situations zero is a lower limit for the sample values. From a practical 
viewpoint these limits on the sample values do not contradict the assumption 
that the population is normal, since the area under that part of the normal 
distribution which lies outside the interval (a, 6) can be considered negligible. 
Thus, since Pr(M/t;“ ^ w) = Pr(M $ o^w), test A can be restated in the form 
Accept p < lio if the sample point (xi, x») falls in the region (A) of the xi , Xn 








TABLE 2 

Some onesided and symmetrical tests with bounded significance levels 


260 


JOHN E. WALSH 



I max[j8 , (.5^9 + .2 Sj- 7 + .22x^1 < /x,, j min[j -2 , (.ojti + .28a-3 + .22x2)] > mo Approx. [ Approx. 





RANOE>MIDRANOE TEST 


261 


TABLE 3 

Power function values for lest A 


Type 

Test 

I 

Sample 

i^prox. 

Efficiency 

Significance 

Approximate Values of Power Function 

Size 

Level 

« - 1 

5 = 1 

a = u 

5 = 2 

5 = 2} 



% 







t 

5.4 


.05 

.244 

.607 

.886 

.969 


A 

6 

90 

.05 

.259 

.599 

.868 

.967 


t 

7.5 


.05 

.333 

.783 

.971 



A 

10 

75 

.05 

.351 

.779 

.962 



t 

5.88 




.248 

.551 

.820 

.957 

A 

6 

98 



.271 

.568 

.809 

.935 

t 

7.2 


.01 

.091 

.371 

.749 

.949 


A 

8 

90 

.01 

.104 

.389 

.728 

.923 


t 

8 


.01 

.108 

.453 

.832 

.976 


A 

10 

80 

.01 

.124 

.462 

.814 

.963 



TABLE 4 

Power efficiencies of tests A, B and C for a = 6%, 1% and 2<n<10 


n 



2 

3 

4 

5 

6 

7 

8 

9 


.01 

100% 

99.7% 

99.4% 

99% 

98% 

95% 

90% 

85% 

80% 

.05 

100% 

98.5% 

96% 

93.5% 

90% 

86.5% 

82.5% 

78.5% 

75% 


TABLE 5 

Approximate values of Da for a = 5%, 2.5%, 1%, 0.5% and 2<n<10 


n 



2 

3 

4 

5 

6 

7 

8 

9 

10 

0.5% 

31.83 

3.02* 

1.37* 

.85* 

.66 

.55* 

.47, 

.42, 

.39* 

1% 

15.91 

2.11* 

1.04* 

.71 

.56* 

.47, 

.42* 

.38 

.35* 

2.5% 

6.35 

1.30 

.74 

.52 

.43 

.37, 

.33 

.30 

.27, 

5% 

3.16 

.90* 

.55»* 

.42, 

.35* 

.30 

.26, 

.24 

.22,* 


* These values of Da were verified directly by substitution and integration. 
The remaining values of for 3 ^ n ^ 10 were obtained from these and other 
values of Da, (a ± .005, .01, .025, .05), by graphical interpolation. 






262 


JOHN E. WALSH 


Tplane defined by 

(1/2 + Da)x„ + (1/2 — Da)Xl < m, Xn> Xl, O < Xl , Xn < t. 

TABLE 6 


Effect of non-normality on the significance level of the range-midrange test 


n 

Probability Density 
Function 

Significance Level 

Test A 

1 Test B 

Teste 

NTonnal 

.05 

.025 

.01 

.005 

.05 

.025 

.01 

.005 

.10 

.128 

.05 

.02 

.01 

3 

1 if 0<x<l 

0 otherwise 

Mean = i 

.064 

.039 

.018 

.010 

.064 

.039 

.018 

.010 

.078 

.036 

.020 

4 

.053 

.033 

.017 

.0096 

.053 

.033 

.017 

.0096 

.106 

.066 

.034 

.0192 

5 j 

.043 

.029 

.015 

.0094 

.043 

.029 

.015 

.0094 

.086 

.058 

.030 

.0188 

3 

— 30 <X< 00 

Mean = 0 

.036 

.017 

.0063 

.0031 

.036 

.017 

.0063 

.0031 

.072 

.034 

.0126 

.0062 

. 4 

.043 

.016 

.0055 

.0024 

.043 

.016 

.0055 

.0024 

.085 

.032 

.0101 

.0048 

5 

.095 

.026 

.0059 

.0027 

.095 

.026 

.0059 

.0027 

.190 

.052 

.0118 

.0054 

3 

1x2 if -l>x>l 

0 otherwise 

Mean = 0 

.119 

.104 

.073 

.050 

.119 

.104 

.073 

.050 

.238 

.208 

.146 

.100 

4 

.062 

.061 

.055 

.045 

.062 

.061 

.055 

.045 

.124 

.122 

.110 

.090 

5 

.031 

.031 

.031 

.029 

.031 

.031 

.031 

.029 

.062 

.062 

.062 

.058 

3 

e”* if 0<x< oo 

0 otherwise 

Mean = 1 

.014 

.0067 

.0025 

.0012 

.158 

.108 

.059 

.035 

.172 

.115 

.062 

.036 

4 

.013 

.0048 

.0016 

.0007 

.144 

.104 

.065 

.042 

.157 

.109 

.067 

.043 

5 

.017 

.0055 

0013 

.0006 

.122 

.096 

.061 

.045 

.139 

.102 

.062 

.046 

3 

2x if 0<x<l 

0 otherwise 

Mean = f 

.035 

.019 

.0075 

.0038 

.096 

.061 

.030 

.017 

.131 

.080 

.038 

.021 

4 

.031 

.016 

.0065 

.0031 

.083 

.055 

.031 

.018 

.114 

.071 

.038 

.021 

5 

.028 

.015 

.0057 

.0031 

.068 

.050 

.028 

.037 

.019 

.096 

.065 

.032 

.020 

3 

3x2 if 0<x<l 

0 otherwise 

Mean =» J 

.027 

.014 

.0053 

.0026 

.112 

.072 

.021 

.139 

.086 

.042 

.024 

4 

.024 

.011 

.0043 

.0019 

.099 

.067 

.039 

.024 

.123 

.078 

.043 

.026 

5 

.023 

.012 

.0037 

.0019 

.082 

.061 

.036 

.025 

.105 

.073 

.040 

.027 


Likewise test B can be restated as 
Accept p > iM) if (.Xl, x„) faUs in the region (B) defined by 

(1/2 - D«)x« + (1/2 + I>a)Xl > no, Xn> Xl, O < Xl , X„ < 6. 



RANGE-MIDRANGE TEST 


263 


Test C now becomes 

Accept p 9^ txdif {xi, Xn) falls in either of the regions (-4) or (B). 

Figure 1 (i) contains a schematic diagram of the regions (^4) and (B). Test A 
can be applied by constructing a graph of the region (A) and giving the instruc¬ 
tions to accept /X < /xo if {xi , Xn) falls in (A), Similarly for test B and region (JS). 
Test C is applied by constructing a graph of both (A) and (B) and accepting 
/X /Xo if (xi, Xn) falls in either (A) or (B). 

Frequently it is desirable to simultaneously consider more than one significance 
level. This can be accomplished in the manner indicated by Figure l(ii). 


6 . Effect of non-normality on significance level. It has been shown that the 
range-midrange test compares very favorably with the Student /-test for suffi¬ 



ciently small samples and normality. In practice, however, it may happen that 
normality is assumed for cases in which the population is not even approximately 
normal. Although this represents an error in judgment on the part of the 
person applying the test, such situations will undoubtedly occur if the range- 
midrange test is used very frequently. The purpose of this section is to 
investigate the effect of non-normality on the significance level of the range- 
midrange test when the values of Da based on normality are used. The cor¬ 
responding effect of these non-normal populations on the significance level of 
the /-test was not considered because of computational difficulties; however the 
effect of some other non-normal populations on the significance level of the 
/-test was experimentally investigated by Pearson in [3]. The results of this 
empirical investigation and of later investigations shows that the significance 
level of the /-test is not very sensitive to the requirement of normality for small 
samples. 

Six populations were chosen for investigation. Three of these populations are 



264 


JOHN B. WALSH 


symmetrical while the remaining three are strongly asymmetrical. These 
particular populations were considered because their probability density func* 
tions have a wide variety of different shapes; also because the significance level 
of the range-midrange test can be computed in closed form for these populations. 

The populations investigated are defined by their probability density functions. 
Table 6 contains a list of the probability density functions considered along with 
the resulting significance levels for the range-midrange test. The cases in¬ 
vestigated are n = 3, 4, 5 and a = 5%, 2.5%, 1%, 0.5%. Larger values of 
n were not used because of computational difficulties. The situation of n = 2 
was not considered because the <-test and the range-midrange test are identical 
for this case. The significance levels of Table 6 were computed by making 
direct application of (1) and (2) of section 6. 


6. Significance level and power function derivations. The purpose of this 
section is to present derivations of the significance level and power function 
expressions which were used in the preceding sections. First a general probabil¬ 
ity expression will be evaluated. Direct applications of the results obtained 
for this expression yield the required significance level and power function 
relations. 

Let *1 and Xn be the smallest and largest values, respectively, of a sample of 
size n drawn from a population with probability density function /(a:). The 
non-zero probability range of this population is y < x < ^. Also let three 
constants ci, c„ , Co, (ci -f c„ = 1), be given and consider the value of 

Pr (cixi + c„x„ < Co)-, where Jlf(z) »= f fiy) dy. 

J-OO 


•Using direct methods it is found that the value of this expression is given by 
[M{co)r if Cl = 0. 

0 if0<ci<l, Cb <7 

^MiV) - M ~ dV 

if 0 < Cl < 1, Co > y. 




(cq— ci'y)/c„ 
«0 


(1) 1 - [1 - M(ctt)]" if C - 1 

0 if Cl > 1, Co < min [y, ca -h c„j8]. 

1 - n r \M{y) - M ^/(7) dV 

J(co-eny)fcn L \ / J 

_ ( co^y j JJ ^ Ci 7 -p C,JS < c# < 7. 

I- nf^ [m<7) - M f(V) dV if Cl > 1, co > y. 



RANGB-MIDRANOB TEST 


265 


The value of Pr(ciii + c„a:« < Co) for Ci < 0 can be obtained from the above 
results for ci > 1. It is easily shown that 

(2) Pr(cia:i + c,*, < cb) = 1 - Pr(ciyi - dy, < cl,), 

where 

tit 

Cl — Cn ) C|| ““ Cl , Co “ "~C<) , 

and yi, y, are the smallest and largest values, respectively, of a sample of size n 
drawn from a population with probability function g(y) = f{—y). Thus if 
Cl < 0, Cl = Cn > 1 and obvious modifications of the results for ci > 1 will 
furnish the value of Pr(ciyi + chyn < co). 

The above general results were used in section 5 to investigate the effect of 
non-normality on the significance level of the range-midrange test. 

Now consider the case in which the n sample values are drawn from a normal 
population with mean /i and variance <r*. Then, for test A, 

Power Function =* Pr{(l/2 — Da)xi -f- (1/2 •+- < /zoj 

= Pr{(l/2 - Dje, (1/2 -1- Z).K < «}, 

where 


2i = (*1 - n)/ir, — n)/a, 5 = («» — M)/<r. 

Using the above results with 


dx, 


it is found that the power function for test A is 


I - nj* NiV) - AT I' 


6 - ( 1/2 + 


In—1 


fiV) dV if Z>. < 1/2; 


1/2 - Da 

(3) jTOr if D, = 1/2; 

The value of Da (for given n) corresponding to a specified significance level a 
for test A is obtained by solving the equation ' 


f(V) dV, if Da > 1/2. 


(4) « - P^(0), 

where Pa(£) is the power function for test A . From symmetry and the fact that 
test C is a combination of tests A and B, test B has significance level a and test 
C significance level 2a for this value of Da • 

For n = 2, test A becomes a Student t-test with one degree of freedom if Z>« 
is replaced by fa/2. The relation Da = fa/2 gives m easily applied method of 
computing Da for this case. 

Approximate values of Da for a = 5%, 2.5%, 1%, 0.5% are contained in 



266 


JOHN E. WALSH 


Table 5 for 2 < n < 10. For 3 < n < 10, these values were obtained from (3) 
and (4) by approximate integration and interpolation. For n » 2, the relation 
between Da and ta was used. 


PART II. SOME TESTS WITH BOUNDED SIGNIFICANCE LEVELS 


7. Introduction. In this part some significance tests (for the mean) are 
derived which are based on the assumption of a sample from a normal population. 
These tests have the property that the significance level is boimded near the 
value for normality under very general conditions. These conditions are 


(D) 


(a) The observations used for a test are independent, 
j (b) Each observation comes from a continuous symmetrical population 
with mean /*. 


It is to be emphasized that no two observations an^ necessarily drawn from 
the same population. 

The bounded significance level tests developed are summarized in Table 2. 
These tests can be used to supplement the tests presented in [5] for n < 9, where 
the tests of [5] do not furnish a very wide variety of suitable significance levels. 


8. Outline of derivations. Let us consider the range-midrange test for the 
more general situation in which the set of independent observations used are from 
arbitrary but fixed populations satisfying conditions (D). Let Da be redefined 
so that the resulting test A has significance level a. Then it is easily seen that 
Da is a monotone decreasing function a. Thus the significance level of the 
modified test A will always be less than or equal to (1/2)" if Z)« > 1/2. The 
significance level bounds for the tests n = 4, a = 5%; n = 5, a = 2.5%; n = 6, 
« = 1%; n = 7, a = 0.5% of Table 2 were obtained from this relation and 
obvious significance level relations among tests A, B and C. 

The significance levels (for normality) for the tests n = 5, a = 5%; n = 6, 
a = 2.5%; »*7, o = l%;« = 8, a = 0.5% were obtained by approximate 
integration of the expression derived for Pr[(l/2 + c)z„ + (1/2 — c)x„_i < n], 
(0 < c < 1/2), for several values of c and then graphical interpolation (here a 
is the one-sided test significance level). The significance level bounds were 
determined from 

(1/2)" = Pr(x, < m) < Prl(l/2 + c)x, -f (1/2 - c)Xn-, < m1 

< Pr[(l/2)(x„ + x,-i) < Ml « (1/2)-'*. 

The significance levels for the tests n ■» 8, a « 1%; n * 9, o ■= 0.5% were 
obtained by considering the relations 

Prjmax Ix«_t, (x, -f x»_t)/2] < m} = (1 + t)(l/2)", (i « 0,1, 2, 3), 
and applsdng linear interpolation to find a value c, (0 < c < 1/2), such that 



RANGE-MIDRANOE TEST 


267 


Prjmax [a:„_i, 0.5a:» + ca;„_j + (1/2 — c)a:„_i] < m} has the desired value. 
The significance level bounds were found from 

Pr{(l/2)(a:n + Xn-i) < m} < Pr{max[*«_i, 0.5aj„ + cx „-2 + (^ - c)a;„_i] < /*} 

< Pr{max[x„_i, (l/2)(x„ + x„_i)] < m}. 

The derivation of the power efficiencies listed in Table 2 will not be considered 
here. Detailed derivations can be found in [7]. 

REFERENCES 

[1] J. F. Daly, the use of the sample range in an analogue of Students’ ^-test,” Annals 

of Math, Stat., Vol. 17 (1946), pp. 71-74. 

[2] E. Lord, *‘The use of range in place of standard deviation in ^-test,” Biometrika, V"ol. 34 

(1947), pp. 41-67. 

[3] E. S. Pearson, “The distribution of frequency constants in small samples from non- 

normal symmetrical and skew populations,” Biometrika, Vol. 21 (1929), pp. 280-286. 

[4] John E. Walsh, “On the power function of the sign test for slippage of means,” Annals 

of Math. StaL, Vol. 17 (1946), pp. 358-362. 

[5] John E. Walsh, “Some significance tests for the median which are valid under very 

general conditions,” Annals of Math. Stat., Vol. 20 (1949), pp. 64-81. 

610-611. Submitted for publication in Annals of Math. Stat. 

[6] N. L. Johnson and B. L. Welch, “Applications of the non-central ^-distribution,” 

Biometrikaf Vol. 31 (1940), p. 376. 

[7] John E. Walsh, ‘‘Some significance tests for the median which are valid under very 

general conditions,” unpublished thesis, Princeton University Library, Princeton, 
N. J. 



ASYMPTOTIC STUDENTIZATION IN TESTING OF HYPOTHESES 

By Herman Chbrnoff' 

Cowles Commission for Research in Economies 

1. Summary. A method suggested by Wald for finding critical regions of 

almost constant size and various modifications are considered. Under reasonable 
conditions the sth step of this method gives a critical region of size a + R,i6) 
where 6 is the unknown value of the nuisance parameter, R,(0) = and N 

is the sample size. The first step of this method gives the region which is 
obtained by assuming that an estimate ^ of the nuisance parameter is actually 
equal to 6. 

2. Introduction. The problem of nuisance parameters often arises in the 
testing of h 5 T)otheses in the following form: It is desired to construct a test of a 
h 3 q>othesis H so that the probability of rejecting H if it is true is equal to a. 
However the probability distribution of the data is not uniquely determined 
by H. Indeed, if the hypothesis is true then the observations have a distribution 
depending on a nuisance parameter 0 whose value is not known. Generally a 
critical region will have a size which depends on the value of 0. Neyman has 
done considerable work on the problem of finding similar regions, i.e., regions 
whose size is independent of 0. 

Wald has suggested the following method of finding critical regions whose 
size is almost independent of 0. Suppose that < is a statistic such that if 0 
were known then the critical region t < Ci{0) would be a good critical region 
for testing the hypothesis H. Suppose also that ^ is an estimate of 0 and that 
gd, 6 I 0) represents the joint distribution of t, 0 under H when 0 is the value 
of the nuisance parameter. Then consider the regions 

t < Ci(6) where Pr{< < Ci(S)} = a independent of 0; 

i < Ci(^) + “ Pr{< — ci(d) < = a independent of 0] 

t < ci0) + • • • + C.0) “ Pr{< - ci0) • • • -c._i(d) < = a 

independent of 0. 

Under the assumption that ^ is close to 0 it is reasonable to expect that 
Pr{< < ci(d)) would be close to a. It might also be expected that 
Pr{< < Ci0) + cj(d)) would be even closer to a. 

This method has been shown to have good, properties when considered from 
the asymptotic point of view. Suppose that t, h are two sequences of statistics 

> This paper is based on a dissertation written under the supervision of Professor Abra¬ 
ham Wald and submitted as partial fulfilment of the requirements for Ph.D. in the Gradu¬ 
ate Division of Applied Mathematics of Brown University. 

268 



ASYMPTOTIC STUDENTIZATION 


269 


(depending on N, the size of the sample or an analogous variable) with distribu¬ 
tion represented by g(t, d \ 0) where N is understood to be present. Then it has 
been shown that under reasonable conditions, with modifications for the sake of 
calculation, 

I Pr{< < Ci0) + ‘■ - + c.(d)} - a I = OCAT""'). 

The statement of the theorem presenting this result will be given in section 4. 
It has also been shown that if roughly speaking d is distributed almost sym¬ 
metrically about 0, the above result may be obtained in half the steps, i.e., 

1 Pr\t < ci0) + •■■ + C.0)} - a 1 = OiN ’). 

It is true that under relatively weak conditions and for fixed N it is possible for 
any e > 0 to obtain a function h0) such that | Pr {< < li(d)} — a | < «. However 
such a critical region can have very poor properties from the point of view of the 
alternative hsTJOtheses especially if h0) is a very wildly oscillating function. 
On the other hand this objection does not apply to Wald’s method for large N 
because 

|cr'(fl)|<M r = 0,1, •••,«; 

I I < r = 0, 1, ••• ,5 - 1; 

I cr’(O) I < MAT'-”'* r = 0, 1, 

and hence Ci0) 4- • • • + C20) is almost constant over “that small range in 
which d will probably fall.” 

In the above it has been implied that 5 is a one dimensional variable. However 
the results are easily extended to the case where is a Ar-dimensional variable. 

The direct application of the method is often quite difficult because of the 
calculations involved. Modifications can be applied which simplify the cal¬ 
culations. Such modification usually consist of changing the Cr0) by a small 
amount provided the remainder is simple and “well behaved.” A case where 
considerable simplifications can be made is that where gi(t \6, 0), the conditional 
distribution of t, can be expanded in a Taylor Expansion, 

gi{t I 6,0) — giici{0) \0,0) + (t — Ci{0)) ^ "I" 

+ ••■ +r, S A 

where the partial derivatives “behave.” This case will be described in detail in 
section 3, and an example previously treated by Welch (see [1]) will be discussed 
in section 4. 

Another case where simplifications often arise is the asymptotic case, that is 
the case where g{t, 6 | 0) has an asymptotic expansion. The asymptotic case 



270 


HERMAN CHERNOFF 


may also be regarded as an extension of the following partition principle which 
is very useful. If g{t, 6\6) = go{t, ^ | tf) + h(t, h \ 0) and JJ} hfdtS < 
and if y>(^) is such that 

/ oo ^^(e) I 

dH dtgo(tJ\e)-a < 
oe J—00 I 

then I Pr{< < <p{i)\ — a | < Thus our theorems apply to g{t, 6 \ 0) 

g = go + h where go has suflScient differentiability properties. 


3. The Taylor expansion treatment. Let g{t,b\0) = gi{l | h, 0)gt0 1 0) where 
gi is the conditional density of t given 6 and pj(d | 0) is the marginal density of 

p»(< ^ dhgit, d I fl) is the marginal density of t. In what follows we shall 

use M as a generic bound. Thus the statement/(<, 0) <M(0i, 0%), 0i< 0 <0%, 
means that there is a constant M depending on {0i, 0t) and independent of 
t, 0, N so that fit, 0) < Mi0i ,0i)0i< 0 < 02 . 

First we obtain Ci(fl) so that Pr{< < ci(S)} = a. 

Then we have 

Theorem 1. If for every finite interval ( 01 , 02 ), 


(i) 


^3 

d0’> 


(tl0 +A) 


<Gi(t, 0 )<G 2 (t), lAl<A'( 0 i, 02 ,N), p = 0,l,--*s, 


01 < 0,0 + A< 02, 

where f Goit) dt < ilf(9i, ^ 2 ), Gi and G 2 may depend on N, 0i, and 0t 


(ii) 


d^^^gsit I 0) 

d0»dt<‘ 


is continuous in t, 0 and 


bounded in absolute value by M(Ci, Co, 0 i, 02) for p + q < s, 0 i < 0 < 02 , 
Cl ^ t ^ Co 

(iii) 0 < Co, 01,02) ^ ^ h< 0 < 00, Ci<t < Co-, 

(iv) 0 < a < 1, 

then Pr{< < Ci(tf)} = a defines Ci(fl) uniquely and so that 1 Ci’‘\0) | < M(0i, 02 ) 
forp — 0,1, ,8 01 < 0 < 00 . 

Proof. Since goit \ 0) is positive, ci(0) is uniquely defined by condition (i). 
From this and conditions (i) and (ii) it follows tW c[(0) exists and is given by 

+ cii0)gi(ci(0) i 0). 


( 1 ) 



ASYMPTOTIC 8TUDENTIZATION 


271 


We may continue in this fashion differentiating formally p < s times to get 
<2) dt 

+ ci^\6)gi(ciid) 1 = 0, ik , t + i < p. 

From the continuity and positiveness it follows that c^^\d) is continuous. Since 

/ Giif) dt <M(0i, 6i) it follows that there is a constant M{di, 0^) so that 

eo 

Giit) dt < a, / (? 2 (<) dt < I - a. 

00 ^ 

Thus 


1 ci(0) I < M(Oi, 02). 

From (1) and condition (i) it follows easily that j c'i(0) \ < M{0i, 0i). Similarly 
we obtain | Ci^^{0) 1 < M(tfi, 0^ for < ^ < ^ 2 . 

While the conditions (i) to (iv) suffice to insure the results of the theorem 
they are not necessary. It is often possible to obtain these properties of Ci(0) 
in particular examples where gi{t, 0) does vanish at points so long as gt(ci(0), 0) 
behaves well. 

Definition 1. is an admissible function of order m(m < s, s fixed in 

advance) if v>m(d) =* Ci(^) + • • • + Cm(^) where Prjt < Ci{0)\ = a and 

(3) I I < M( 02 , 02 )Ar<‘-»'*, p = 0,1, • • • , s + 1 - t, < 0 < <>*. 
Now let 

( 4 ) Hk{0) = W*'" - 0f = AT*'* I 0 - 0)'‘gi(,^ \ 0) d0 and 

J-oo 

(5) ^ l<—Cl^ • 

We have 
Theorem 2. If 

(i) Pr{< < ci(«)) = a, 0 < a < 1, and | 1 < M(«i, 02 ), 

01 < 0 < 02 , p = 0, I, - s; 

(ii) S =» S(N) = 0(1) is a function of N such that 

f dd\S-0 M 1 0) < M(0i, 02)N-*'\ 0i<0<02,k = O,l,^-- ,s; 

nP-H 

(iii) I 9i(t \^,O)\<M(0i, 02 ), p + q = s, 

1 1 - ci(0) I < P, M I < 5, 



272 


HERMAN CHERNOFF 


where 

p = Max. I Cl (d) - ci(fl) I + ij > 0, ei< e < ei; 

(iv) I I < Midi , 61 ) for p = 0, 1, • ■ • , s - A:, A: = 1, ... , 

6i<e<9i’r 

(V) \G^^(e)\<M(ei,di) for 1 = 0,1,... ,s-p-q, 

p + q < s — 1 ; 

(vi) <pm 0 ) is an admissible function of order m < s, 
then 

(6) Pr{< < ^„(^)} = a + + • • • + r^«?)Ar'« 

where 

I < M(«i, for p = 0,1, ••• , « - j, j < s, 6 i<e<e%. 

Proof. Expand gi{t | ^, d) in a Taylor Expansion about t = ci(tf), ^ — 0 , 
with remainder terms of order s in t — Ci(ff), ^ and expand Ci{ 6 ) about 
6 = $ where the remainder term is of order « + 1 — t. Then for | d — 6 | < 
we have 

gi(t I 6 , (?) dt = P{(^ - ey, cy\e), RN-'^, 

Ci(t) 

where P is a polynomial and | P | 0 — 0 y~'N~'‘^ for | 5 — 51 <5. 

•Integrating over \ 6 — 6 \ < S, we use conditions (ii), (iv) and (v) and the 
theorem follows. By a similar argument we have 
Theorem 3. If 

(i) the conditions of Theorem 2 hold for each (ffi, 63 ) so that 

— W < ft < ^1 < 02 < | 9 j < 00 

and 

(ii) ffi(ci(0) !«,«)> (1/M(0i, 0*)) >0, 01 < 0 < 02, 

then the sequence 

VJi(d) = Ci 0 )\ 

(g) <P^ 0 ) = ci 0 ) - ri.i(^)Ar"*; . 

_ /a. r^,.«_i(d)Ar^’"-*»* 

<P«(d) , j,i(ci(^)|d,d) ’ 

is a sequence of admissible Junctions such that 
(a) Pr{< < vM = a + R(e)N-”'\ 


m < s. 



ASYMPTOTIC STUDENTIZATION 


273 


v}here | R{0) | < , 0i) for < $i < 0 < Ot < 02 . 

These theorems permit us to obtain and to calculate critical regions whose 
size is asymptotically close to a. 

In Theorem 2, condition (ii) was much stronger than necessary. It may be 
relaxed if we define 


Hk{0)=i. N'‘%{^\0){^-0td0, 


where 


Pr{| ^ - 0 I > 5} < M(^0i , 02 )N~‘'\ 5 = b{N) = 0(1). 

However this may complicate the calculations. 

The symmetric case arises when the first moment almost vanishes, i.e. 

<10) I H\^\0) I < M{0,, 02)N-^'\ p = 0,1, • •. , 8 - 1, 02<0<02. 

In this case we have instead of the sequence given in Theorem 3, the sequence 

<(>i% = ci(d); 


< 11 ) 


n.2(d)Ar'+ r,.,(d)ir’'' 


V’m(^) = — 


rm~l,2m~20)N~^'^ ** + rm-l,2m-10)N 


gim) \0j} 

which is a sequence of admissible functions such that 

Pr{< < <Pr.0)\ = a + r„,2n,(0)N-”' + • • • + r„,,(<»)Ar‘« 

I ri,'i(fl) 1 < M{0i , 02 ) 01 < 0 < 02 p = 0, 1, • • • , s - 


4. An example. The following example previously treated by Welch from a 
different point of view will furnish an illustration of the applicability of the 
theorems to the case where 0 is & k dimensional parameter. It will also serve 
as an example of an extended type of symmetry. That is, it has the property 
that 1 Hlkli{0) I <M{0i, 02 )N~^'^, and hence, in the sequence (11), the rm,2m+ii0) 
terms effectively vanish thereby simplifying the calculations considerably. 

We suppose that t is a normally distributed variable with mean n and variance 
= Xi(r? + • • • + Xiv* where the \i are known positive constants, the <7< are 
unknown parameters each of which is independently estimated by s’ where 
iViS</ff< has the x* distribution with Ni degrees of freedom. It is desired to test 
the h 3 q)othesis that p = 0 so that the probability of rejecting the hypothesis 
if it is true should equal a. Under the hypothesis the joint density distribution 
of f. Si, • • • s*, is given by 


g{t, si, ■■-,81/^1, ■■■,4) 


g-«*/ 2 (r» 

Vw 


k 


I[gisl\<Tl,Nd, 


( 12 ) 



274 


HERMAN CHERNOFF 


where the moments of s? — (r< = are given by the coefficients of uV^t 

in the expansion about u = 0 of 

= 0 ; 

= 2<rJ ; 

= (V + 2 Ar 7 V?. 


We define Cj(fl) by Pr{t < Ci(fl)) = a where 

6 = {<r\, <rl, • • • , <rl) and d = (sj, Sa, • • • , s*), 

Ci{B) = Ci<r. 

Now aj(tf) — a = Pr{ci(^) < t < may be computed within terms of order 
NT by expanding 

Ci(d) Ci<r + Cl ^ X,(Sj ~ ~ Cl 23 X<X,(Sj — — oc/) 


8<r» 


\/2irff* 




+ (t — Ci(r)(—Ci/ff)}, 


whence 


«2(fl) « « ^ dsi • • • d«* n ffCci k<; Ni) ^ 

“ ^ — <r<)(s* ~ ~ ^ X,X,(s? — (T<)(s* — 

(E 2X?<.5iV7‘} + 0(E iV7*). 


fci + cf 


\/2ir<r*l 8<7» 


Thus 


and 


where 


C2(e) = "-i^‘ExMW7‘ 


as(e) = Pr|< < cis + E X?s{JV7‘j = a + 0(EiV7*), 

4s^ 


a* = X) •. 


Further approximations become somewhat complex and should be carried out 
in a systematic fashion. 


6. Remarks. The range of application in practical statistical problems of the 
theorems of section 2 may be somewhat more limited than that of the original 



ASYMPTOTIC STUDENTIZATION 


275 


method proposed by Wald. Concerning the original method, the following 
theorems have been established. 

Theorem 4. If 

(i) Pr{< < Ci(fl)) = a, 0 < a < 1, where | | < M{6i, 0i), 6i < 6 < $ 2 , 


p = 0, 1, • • • , s; 

I 5 + A I + A) I ^ /or f + i < s - 1, Cl < < < C 2 , 

^ + A < ^ 2 , I A I < A', where G(6j 6) depends on Ci, C 2 , , 6 ^, Nj 

and is iniegrable in 6 over (— «, «); 

(iii) I I < e),i+j<s-l,Cx<t<C2,0i<9< 02 , 


where 


L0, 0)\d - 0\^ S < Mi0x ,0,, Cl, C,)W-*'*, k = 0, 1, 


(iv) 0 < A(Ci, C 2 , , 02) < A(t) < gt(t I 0) < B(t) < B(Ci , C 2 , , fl*)< «, 

0i< 0 < 02, Ci<t<Ci, 

f B(t) dt < M(0i, 02); 

•Loo 

(v) g(i, ^10) > 0, 

then a sequence c*0), € 2 ( 0 ), C 2 (^), • • • ,c*0), exists where Cm(0) is uniquely defined in 
( 01 , 02 ) 6j/Pr{< — cf(d) — • • • —(vll_i(d) < c„(fl)} = a, and 

14'’Wl < M(0i, 02)^^'”-”'* p = 0,1, • • • , s - m + 1, 0, < <> < <?, 
and Cm(^) is any function so that 

|c:''’((>) - 4'’W I < MAT"’' /or 0i < e < <? 2 , p = 0, 1, • • • , 5 - m, 


and 

I c:''’(d) I < M(01, - 00 <5<«,p = 0. 1 , + 

Finally for arbitrary within the above conditions, 

I Pr{t - cf(d) - ... - c*(b) <0} - a I < M(0i, <? 2 )W'"' for 0i < 0 < 02 

The conditions on the derivatives with respect to A are natural because 
the intuitive approach to the method seems to hinge on the assumption that 
g(t, d + A I + A) changes gradually with respect to A “independent” of the 
value of N. This would not be true of g(t, d | (? + A) for large N. 

The c*(0) where introduced in Theorem 4 because in practical examples it is 
usually found too difficult to compute Ci(0) efficiently. On the other hand 
there are many alternative ways of obtaining functions with the properties 





276 


HEBMAN CHEBNOFF 


of the The C 2 ( 0 ), 03 ( 0 ) etc. mentioned in Theorems 1, 2 , 3 play the role 

of the c*( 0 ) in Theorem 4 with the exception of the condition on c* ( 0 ) for outside 
((?i, 02 ). The exception is due to the fact that the Theorems 1, 2, 3 correqjond 
to the “infinite case.” Theorem 4 is applicable to those cases where one is 
willing to assume that 0 lies in ( 0 i, 02 ). It often happens that there is no such 
reason or that the conditions of the theorem hold only for every closed proper 
subinterval of (j3i, ft) but not for ft < ® < ft itself. In these cases we may 
apply 

Theobem 5. If 

(i) aU of the conditions of Theorem 4 apply to every finite proper closed svMnterval 
( 01 , ft) of (ft , ft) where (ft , ft) may be an infinite interval; 

(ii) Pr{| ^ - 0 I > 5(iV)j < ilf(ft , 02 )N~"^ for ft < ft < <? < ft < ft , where 
S(N) = 0(1) unless ft or ft is finite, in which case 8(N) = o(l), then a 

sequence Ci(6), C2(0), c*(^), • • • , c*(^), exists, where Cm{0) is uniquely defined in 
(ft. ft) by Pr{t — ci*(0) — C 2 *(^) — • • • — c^_i(^) < 0 ^( 6 )} = a, so that for every 
(ft, ft), 

I c!L^^\0) I < M(ft, ft)Ar<’"-“'%/ft < ft < < ft < ft, ’ 

p = 0 , 1 , • • • , s — m + 1 ’ 

and for c*(0) arbitrary within the above conditions 

I Pr{< < ct{^) + • • • + c:(^)) - a I < M(ft , 02 )N-”''^ 

if ft < ft < S < ft < ft, m < s. 

Essentially this theorem can be proved by reference to the proof of Theorem 4 
applied to the function 

g*(t, ^0) = g{t, 6\0) for 1 ^ - 0 1 < 5; 

= 0 I $ - 0 1 > 6 . 

Some of the conditions in Theorems 4 and 5 are stronger than necessary. For 
example gr > 0 may be replaced by a weaker condition where g is positive in a 
region about t — Cii0). On the other hand the condition Pr{| ^ | >3} < 

in Theorem 5 is necessary to the argument used in the proof. It is easy 
to construct trivial examples where the results of this theorem apply although 
this condition is not satisfied. However an example has also been constructed 
where all the conditions of Theorem 5 hold except for this condition and the 
method of Wald fails to give the results. 

These theorems are very easily extended to the A:-dimensional parameter case 
by replacing the conditions on the derivatives with respect to A by the same 
order mixed derivatives with respect to Ai, Aj, • • • , A* of 

g(t, + Ai, 4 + A 2 , • • • , + At I ft + Ai, • • • , ft + A*). 

The symmetric case arises when the distribution of 6 is almost symmetric 
about 0. More exactly we have 



ASYMPTOTIC STUDENTIZATION 


277 


Theorem 6. If 

(i) All the conditions of Theorem 4 hold and L0, 6) has the additional 'property that 


f 0 - efL0, 0 ) d& < M(di, 02 )N-\ ei<0< 02, 

J—OO 


and 

(ii) 




< L0, 0)\d - 0\, 


|aA‘a<' ' dA*dP 

Ci<t<C 2 , 01<0 < 02 , i + j<s-l, 

then it is possible to construct a sequence ci0), C2{0), • • • , c*{0), as in Theorem 4 
SO that 

I df\0) I < Mi02 , 0,)N~^”'-^\ 

p = 0, I, s — 2 m + 2, 01 < 0 < 0i 
I cZ^^\0) - cif\0) I < M(0i, 02 )N-”, 


p — 0, I, - - - , s — 2 m + I, 01 < 0 < 02 ; 
I I < M(0i, 02 )n-^’"-^\ 

p = 0, 1, — 2m + 2, —00 < 0 < oo; 

and 

I Pr{< < c*i0) + • • • + c*0)\ -a\<M{0i, 02 )N-'\ 

»,<»<»., r = [“4-‘] 

Theorem 5 can also be extended to the symmetric case. 

It is often possible in the theory of statistics to obtain an asymptotic expansion 
of the distribution of t, The treatment of such cases is often very simple 
because of the prominent role played by the normal distribution in such 
asymptotic expansions. Suppose that 

g(t, d I <?) = y/Nyit, I e), 

where = -s/N0 — 0)‘,y — density distribution of {t,4d) 

1^" I ~ yo(t, 4'\0) A- N~^'\i{,t, i/\0) A- • • • + yt-\{t, ^ 1 0 ) 

+ p{t, ^ 1 0)N-'\ 

7q »7i»• • • > Y»-i are independent of N; 

f J loid^dt < M(0 i,02), 0i<0<02; 

f J lyild^dt < M(0 i,02), 0i<0<02. 



278 


HERMAN CHBRNOPP 


Correspondingly we have 

git, ^ I (») - (/,«, ^ I fl) + ir^'^it, d 1 ff) + ... + 

9,-iH, « I «) + r«, i 1 9)2^*^ 

where 


giit, d I tf) = VN yiit, ^ I e), rit, ^ | <») = VNp(t, f | 0). 

fciii) 


Then if we define ci(S) by £ dtgo = a, 

/ «0 ^Cl (•) + •••+«!«—l(*)+C»»(^) 

Si ^ diffo 

00 -h«m—1( 




(<) 


d%o + + ••• + 


or by 

Cmio) Sgoiclio), d I e) 

/ » /•ei(*)+-.;+c™_i(») 

dd / d%o + 

00 J—00 


we obtain 

I Pr{< < c, (^) + • • • + c,0)] - a I < MiOi, 

if g obeys the conditions of Theorem 4 except that we need only s — i + 1 
derivatives for gi(t, 6 | B). The above definitions of CmiB) correspond to the 
cl(^) in Theorem 4. Analogues of Theorems 5 and 6 also apply to the asymptotic 
case. 

REFERENCE 

[1] B. L. Welch, the student!zation of several variances,” AnnaU of Math, Slat,, 

Vol. 18 (1947), p. 118. 



SOME LOW MOMENTS OF ORDER STATISTICS 
By H. J. Godwin 
University College of Swansea, Wales 

1. Introduction. In a paper on order statistics from several populations 
[1], there were given, among other results, the means, variances, covariances, and 
correlations of order statistics in samples of ten or less from a normal population. 
These were obtained by numerical integration, and on account of the difficulties 
arising therefrom, some results were given to only two decimal places. More 
recently, Jones [3] has shown that some of the integrals, for sample sizes not 
greater than four, can be evaluated explicitly. 

In this note these results arc supplemented in two ways. For a paper which 
the author has recently submitted to Biometrika integrals were evaluated which 
can be used to give some of the results in [1] to more places of decimals. It is 
also shown that the table of explicit values can be extended. 


2. Approximate values. Let the population studied be normal with mean 
zero and variance unity, and let the members of a sample of n be x(l | n) > 
x{2\n) > • • • >x{n\n). The integrals available are 


yp{i) 


i'ih j) 


= F‘(x)(l - F(x)ydxil < i < 5), and 

= F'{x) / (1 ~ ^ ^ < lo). 


where 

These were evaluated to ten places of decimals, the last place possibly being in 
error by one or two units. 

For the purpose in hand we define also 

= f xF'(x){l - F(x)ydx = -a(j,i), 

J—00 

and 

fix)F\x)(l - F(x)ydx = 

Now, on integrating by parts, we have 

^ (1 - F(y)) dy = -«(1 - F{x)) + ^ yf{y) dy, 

279 



280 


H. J. GODWIN 


and for/(®) as defined above (so that in what follows we restrict ourselves to the 
normal distribution only), the second integral is/(a:). Hence ipd, 1) + a{i, 1) = 
1 /(i + 1) and we can construct a table of a’s by using also the relation 

a(t, j) - a(i + 1, j) = a(i, j + 1). 

Again, on integrating by parts, we have 

/3(t, *■) = / . , - F{x)y~^ - 2a;(l - F(a;))*} dx(i > 0) 

J-ao i “T 1 

- . ! , - 1, f - 1) - 0iiy t )} - a(i + 1, t), 

using the fact that, in this particular case, 2F — 1 is an odd function and 
F(1 — F) an even function of x. 

i 2 

Hence /3(t, i) = 2"(2irFT) - 1, * - 1) - »)i using 

/3(b j) — /3(* + 1, i) = i + 1) we can find the ff’s. 

Finally we put yii, j) = ijg shown by 

tj 

an integration to be equal in this case to y(j, i). 

Now 

(1) E(x(i I n) - xH + 1 1 n)) = "C< f F"-‘(x)(l - F(x)y dx, 

J—00 

as was proved by Irwin [2]. By the symmetry here this integral is the same 
if i, n — i are interchanged, and since F“(l — Ff + 1^(1 — F)“ is a polynomial 
In F(1 — F) (as may be seen by putting F = i + G) the integrals (1) can be 
expressed in terms of the ^(i). Using the fact that the expected value of the 
median is zero the E(x(i | n)) follow. 

The frequency function of a: = x(i | n) is 

and so 

(2) E(x(i I n))“ = i ”C< /3(t — 1, n — i). 

The joint frequency function of a:,- = a:(t | n) and Xj = x(j | n) is 

!)!(»- 

(taking j > i), and to find E{xi Xj) we multiply by x,- x,- and integrate, x,' going 



LOW MOMENTS 


281 


from —CO to 00 . and a:, from Xj to oo. On expanding (1—F(a:,) — (1 —F(xi)))^* * 
by the multinomial theorem a typical term is 

(3) r rXi Xi fixdfix,) (1 - F(xi))dx, dxt. 

J-OO Jxi 


TABLE 1 


Means and standard deviations 


Statistic 

Mean 

Standard 

Deviation 

Statistic 

Mean 

Standard 

Deviation 

x(l|2) 

.5641896 

.8256453 

x(l|8) 

1.4236003 

.6106530 

x(l|3) 

.8462844 

.7479754 

x(2|8) 

.8522249 

.4892862 

*(2|3) 

0 

.6698292 

x(3|8) 

.4728225 

.4480723 

x(l|4) 

1.0293754 

.7012241 

*(48) 

.1525144 

.4326503 

x(2|4) 

.2970114 

.6003793 

*(1|9) 

1.4850132 

.5977903 

x(li5) 

1.1629645 

.6689799 

*(2|9) 

.9322975 

.4750755 

x(2|5) 

.4950190 

.5581388 

x(3|9) 

.5719708 

.4317205 

a:(3|5) 

0 

.5355685 

x(4|9) 

.2745259 

.4129877 

^(116) 

1.2672064 

.6449241 

x(5|9) 

0 

.4075553 

x(2|6) 

.6417550 

.5287511 

x(lilO) 

1.5387527 

.5868083 

x(3|6) 

.2015468 

.4961981 

x(2|10) 

1.0013571 

.4631674 

x(l|7) 

1.3521784 

.6260334 

x(3 10) 

.6560591 

.4183339 

^(2|7) 

.7573743 

.5066882 

x(410) 

.3757647 

.3974153 

*(317) 

x(4|7) 

.3527070 

0 

.4687447 

.4587449 

*(5|10) 

.1226678 

.3886565 


We integrate by parts with respect to Xi and then with respect to x,-: the integral 
(3) is then seen to be 7(i + »", w — j + s + 1), and 


(4) 


E{xiX,) _ . _ j)_ j.), Z Z 


(-ir*(j-i-1)! 

r!8!(j — i — 1 — r — s)! 


7(i + r, n - ;■ + s + 1). 


Using (1), (2) and (4), the values in Tables 1, 2, and 3 are obtained. The 
values are estimated to be correct, except for sample sizes 9 and 10, for which 
there may be errors of one or two units in the last place given. Missing values 
are filled in by considerations of symmetry. 


3. Exact values. All the integrals occurring for i/{i) or ^{i, j) can, by suitable 
transformations, and the integration of one variable over the range —co to <», 



4^ CO to 



9 1.35735.17814. 

2 . 22570. 

3 

4 

5 


1 

.34434 

.17126 

.11626 

.08825 

.07074 

.05840 

.04892 

.04108 

.03404 

2 


.21452 

.14662 

.11170 

.08974 

.07420 

.06222 

.05232 

.04336 

3 



.17500 

.13380 

.10774 

.08923 

.07492 

.06302 


4 




.15794 

.12751 

.10579 

.08895 



5 





.15105 

.12560 







































LOW MOaiENTS 


283 


TABLE 3 

Carrelaiions between order statistics 


n 

i 

. 

3 


3 

4 

5 

6 

7 

8 

9 

10 

2 

1 

.4669 







3 

1 

.5502 

.2947 








4 

1 

.5834 

.3753 

.2129 








2 


.6546 








5 

1 

.6008 

.4135 

.2833 

.1658 







2 


.6973 

.4813 







6 



.4357 


.2269 

.1355 









.5323 

.3788 










.7444 







7 

1 

.6185 


.3429 

.2609 

.1889 

.1143 







.7346 

.5624 

.4293 

.3115 






HH 



.7699 

.5899 






8 

IHI 



.3585 



.1617 







.7444 

.5823 


.3591 

.2642 








.7859 


.4872 










.7969 






9 

1 

.6273 

.4679 

.3699 

.2986 

.2409 

.1902 

.1412 

.0869 



2 


.7514 

.5964 

.4827 

.3902 

.3083 

.2291 




3 



.7969 

.6466 

.5236 

.4144 





4 




.8139 

.6606 





10 

1 

.6301 

.4736 

.3784 

.3102 

.2561 

.2098 

.1674 

.1252 

.0777 


2 


.7567 

.6068 

.4985 

.4122 

.3380 

.2700 

.2021 



3 



.8048 

.6627 

.5488 

.4507 

.3601 




4 




.8255 

.6849 

.5632 




i 

5 





.8315 






be represented as multiples of j • • • j e~^ dx dy ■ • • , where Q is a positive- 
definite quadratic form in the variables of integration. 















284 


H. J. QODWIN 


Now if Q is oa:*, the integral is i V Wo (this is, in effect, stated by Jones). 
By elementary integration we have also that if Q = oa:* + 2hxy + 6y*, the 
integral is 

Vab-H^i^ ~ Vdb - h*} 


TABLE 4 
Exact expected values 


41|4): 

Vir [(2/5)a 


+ (2/5)c] 



x(2i4): 

Vir [(2/5)a 


- (6/5)c] 



x(li5): 

Ww [(l/3)a 


+c] 



x(2i5): 

Vir f(2/3)a 


-2c] 



x(3|5): 


0 




x(li5)*: 

1 

+b 


+d 


x(2i5)*: 

1 



—4d 


xCsjs)*: 

1 

-2b 


+6d 


x(li5)x(2|5) 


b 


+d 


x(li5)x(3i5) 

2a 

-2b 


-2d 

-/ 

x(li5)x(4i5) 

—2a 




+3/ 

x(l|5)x(5j6) 





-2/ 

x(2i5)x(3i5) 

—2a 

+36 


-d 

+/ 

x(2|5)x(4|5) 

4o 

-4b 


+4d 

-4/ 

x(l|6)*: 

1 

+b 


+3d 


x(2|6)*: 

1 

+b 


-9d 


xcaje)*: 

1 

-2b 


+6d 


x(l|6)x(2|6) 


b 


+3d 


x(l|6)x(3|6) 

3a 

-2b 

+3c 

-6d 

-3/ 

x(lj6)x(4j6) 

—3a 


-9c 


+9/ 

x(l|6)x(5i6) 



12c 


-6/ 

x(li6)x(6|6) 



— 6c 



x(2|6)x{3l6) 

—3a 

+4b 

-3c 


+3/ 

x(2i6)x(4i6) 

9a 

-6b 

+9c 

+6d 

-15/ 

x(2|6)x(5i6) 

— 6a 


— 18c 


+18/ 

x(3i6)x(4i6) 

—6a 

+6b 


-6d 

+6/ 


and if Q is oa:* + + 2/yz + 2gzx + 2hxy, the integral is 


1 

4 



+ arc tan 


gh -^S 
\/oA 


+ arc tan 


hf-_bg 

VbA 


arc tan 


Sg -_cA l 
VcA J 


Where A = ahc + 2fgh — af — bg^ — ch^. 

The author has not succeeded in obtaining similar results with a higher 
number of variables—it is possible that elementary functions no longer suflSce 
then. 






LOW MOMENTS 


[285 


Using these results we can obtain exact expressions for ^{2) and j) 
for 1 <i,j;i+j < 6, which give, in addition to Jones’ results, the exact expected 
values in Table 4, wherein 

a = 15/4ir = 1.19366 20732, 

b = 5V3/4ir = .68916 11193, 

c = (15/2 t*) arc sin (1/3) = .25824 50843, 

d = (5\/3/2x ) arc sin ^ = .11085 93167, 

/ = (15//) arc sin (1/V6) = .63913 55493. 

REFERENCES 

[1] C. Hastings, Jr., F. Mosteller, J. W. Tuket and C. P. Winsor, “Low moments for 

small samples: a comparative study of order statistics,** Annah of Math, Stat.y 
Vol. 18 (1947). 

[2] J. O. Irwin, “The further theory of Francis Galton*8 individual-difference problem,** 

Biometrika^ Vol. 17 (1925). 

[3] H. L. Jones, “Exact lower moments of order statistics in small samples from a normal 

distribution,” Annals of Math. Stat., Vol. 19 (1948). 



ON A THEOREM OF HSU AND ROBBINS 

By P. Erd58 
Syracuse University 

Let • • • be an infinite sequence of measurable functions defined 

on a measure space X with measure m, m(X) = 1, all having the same distribu¬ 
tion function G{t) = m{x; fk{x) < t). In a recent paper Hsu and Robbins* 
prove the following theorem: Assume thai 


Denote by Sn the sc< (a:; Z fkix) > n ), o?td put Mn = w(5„). Then E 

\ U-1 / n-1 

converges. 

1 u**- j n 

It is clear that the same holds if /jk(a:) > n is replaced by fk{x) > c-n 

I *-i I *-i 

(replace/fc( 2 ;) by C‘fk(x)). 

It was conjectured that the conditions (1) and (2) are necessary for the 
convergence of • Dr. Chung pointed it out to me that in this form the 

n*l 

conjecture is inaccurate; to see this it suflfices to put /*(x) = -^(1 + r*(a:)) where 

rk(x) is the kth Rademacher function. Clearly | fkix) | < 1; thus = 0, 
* -00 

thus 2 Mn converges, but / t dG{i) 0. On the other hand we shall show 

• n*l «L-oo 

in the present note that the conjecture of Hsu and Robbins is essentially correct. 
In fact we prove 

Theorem I. The necessary and sufficient condition for the convergence of 
X) Mn is that 

n-1 

I /* ^ 

I tdGit) <1, 

J—QO 

and (2) should hold. 

In proving the sufficiency of Theorem I, we can assume without loss of gener- 

aOO 

ality that (1) holds. It suffices to replace/A:(a;) by (fk(x) — C) where C = [ tdG(i). 

The following proof of the sufficiency of Theorem I (in other words essentially for 
the theorem of Hsu and Robbins) is simpler and quite different from theirs. 
Ihit 

(3) Oi = mix;lfk(x) \ > 2*), 


f tdGit) = 0, 

J_00 

f t^ dG(t) < 00. 

J ..00 

;) j > , and p 


^ Proc. Nat. Acad. Sciences, 1947, pp. 25-31. 

286 



ON A THEOREM OF HSU AND ROBBINS 


287 


since the /*’s all have the same distribution, o< clearly does not depend on k. 
We evidently have 

- a.+x) < 

t ■*0 J —90 i ""0 { mQ 

Thus (2) is equivalent to 

(4) Z 2** a< < 00 . 

•'-0 

Let 2‘ < n < 2‘+\ Put 

Sn^ = (®; fk(x) I > 2*~“, for at least one k < n), 

Sn^ = (x; fki(x) I > n*'‘, |/t,(x) | > n*'‘, for at least two ki < n, fcj < n), 
-Si” = (x; tfUx) > 2‘-^), 

A:-l 

where the dash indicates that the k with j /*(x) ] > n*'^ are omitted. We 
evidently have 

-S„ C U U ;Sl”. 

For if X is not in Sl” U Sl” U Sn \ then clearly 

Z/*(x) < 2*-^ + 2*-* < n. 

00 

Thus to prove the convergence of Z Mn it will suffice to show that 

n-»l 

(5) Z (»«(-Sl”) + m(Sl”) + m(-Sl”)) < oo. 

n—1 

From (3) we obtain that m(Sn^) < n-Oi _2 < 2’'’’‘-ai_2. Thus from (4) 

(6) Z »i(<Sl”) = Z ]Z m(Sn^) < < <». 

n-l i —0 »<*•+> <-0 

From (4) we evidently have that for large u 

m(x; I /k(x) I > u) < l/^t^ 

Thus since the fs are independent and have the same distribution function it 
follows that for sufficiently large n, 

< Z Mx; I/*,(*) I > n*'\ |/*,(x) ) > n*'‘) 

< m(x-, l/i(x)j > n*'‘),m(x; |/ 2 (x) | > = n~‘'\ 

Z m(5l”) < 


Hence 

(7) 


00 . 



288 


P. ERDdS 




Put 

j/jt(a;) for |/fc(a:) | < n*'®; 

[O otherwise. 

Clearly the fk{x) are independent and have the same distribution function 
G+(t). Put 

( 8 ) 


f (dG^(t) = e, gt(x) = fk(z) - f. 

We have from (8) that J gk(x) dm = 0, and by (1) that « 0 as n —<». We 

evidently have 

f (Hgk(x)\ dm == f J^giix) dm + G f gUx)'g]{x) dm. 

Jx \k-l / Jxk-l JZls:t<ISn 

Now since max 1 gk(x) | < + e, 

j^gk{x)dm < (n®'® + «)* • f^glix) dm < cm*'®, 


and 


Thus 


.Hence 

(9) 


jf glix)-gKx) dm = gl{x) dm £ g\{x) dm < Ct. 


m 


/.(S 

(x-,\ igkix) 

\ I k-l 


dm < cjn*®'®. 


> n/16) < Ctii 


•(T/5) 


Thus from (8), (9), |/i(x) | < | gk{x) \ + 1/16 (for e < 1/16) and n/8 < 2’ * 
we have 


m 


(*;| t/ix) I > = m(x;| tjitix) | > 2*-®) 


< I > n/ 16 ^ < , 


or 

(10) m(5i”)<cn“‘"®\ 

Thus finally from (6), (7) and (10) we obtain (5) and this completes the proof 
of the sufficiency of Theorem I. 



ON A THEOREM OP HSU AND ROBBINS 


289 


Next we prove the necessity of Theorem I, in other words we shall show that if 

00 

23 Mn converges then (!') and (2) hold. 

n»l 

First we prove (2). The following proof was suggested by Dr. Chung, who 
simplified my original proof. By a simple rearrangement we see that (2) is 
equivalent to 

(11) E n f d(?(t) < 00 

n—1 J|«|>«n 


for any c > 0; while 

( 12 ) 


f 1^1 dG(t) < 

J-oo 


00 


is equivalent to 
(13) 


f dG{£) < 00 

n«-l •/! (| >cn 


for any c > 0. Now we have clearly, 

(x; |/„(a;) | > 2n) C Sn-i U Sn . 


Hence 


E / dGit) ^ E (m(S„-i) + TO (S„)) < 00. 

n I t| >2n n 

Thus we obtain (12). Since the terms of this series is non-increasing it follows 
that 


(14) 


n f d6(t) —* 0. 

J|<|>2n 


Our assumption being that 2 ilf „ < <» we have M„ —» 0 as n —> oo. 
that there is a constant p > 0 independent of k and n such that 


TO 




^ P. 


It follows 


Now, writing set intersections as products, we have 

U (x; 1 fk (x) 1 > 2n) -(x; E fiix) < n) C )S„ . 
t-i \ I i-i / 


tftk 


U (RkTk) CSn, 


k-l 


Writing this for a moment as 



290 


F. ebdOs 


where i?* — (x; |/*(®) j > 2n) etc. and denoting by R' the complement of R, 
we have 

= m(5„) ^ m (ie* n)) 

= m ( (fti T,y ... n,,)' n) 

= ZmaRiTiY ••• 

fc-1 

^ ZmCRj ••• «*'-x«kn) 

Ib-l 

^ Z {m(/e*-n) - U ... U /e*_i)Kt)I 

l(-l 

^ Z {fn(Tt) - (* - l)m(/Ji))m(ie*)} 

ib-l 

n m 

^ Z (p - n»n(Bi)}m(flt)^ Z (p - <r(l))m(ft*). 

ft-1 fc-l 

S p' 23 = «p' f dOit) 

h»l •'|<|>2n 


by (14) since m{Ri) = I dG(,t), nm{Ri) —^ 0 as 


n —* oo. 


Thus 


Zn f dG(t) g 4 Zm„ < «. 

II Jm1>2ii P n 

Hence we have (11), which is equivalent to (2), The proof of (!') is quite easy. 
By virtue of (2) we can put 

r tGit) = c. 


If C > 1, then it follows from (2) and Tschebycheff inequality that M„ 1 as 
n-^ 00 , thus C ^ 1. But if C = 1, we conclude from (2) and the central limit 
theorem that ilf. does not tend to 0. Hence C < 1, and (1') is proved. 

By similar methods we can prove the following results: Let 2 < c < 4. Put 


= m 




OQ 

Then the necessary and sufficient condition for the convergence ofZ^fn' 



ON A THEOBEM OF HSU AND ROBBINS 


2C1 


is that 


f tdG(t)=0, rii\‘dG(0<^. 

•Loo •Loo 


K c < 2 then the necessary and sufficient condition for the convergence of 
E is that f * 111‘ dG(t) < <». 

n««l •Loe 

Finally we can prove the following result: Assume that / t dG(t) = 0 and 

•Loo 

dG(t) < 00 . Then there exists a constant r so that 

00 

(17) H fx; I E fk(x) > • (log n)’’1 < <». 

n-I L I *-i J 


The case of the Bademacher functions shows that (17) can not be improved 
very much, in fact only the value of r could be improved. 



NOTES 

This section is devoted to brief research and expository articles on methology and 
other short items. 

BROWNIAN MOTION ON THE SURFACE OF THE S-SPHERE 

By K6saku Yosida 

Mathemaiical Institute, Nagoya University 

1. Introduction. Let /S be a n-dimensional compact riemann space with the 
metric ds“ = {/<,(*) such that the totality 0 of the isometric transformations 
of S onto S constitutes a Lie group transitive on S. Consider a temporally 
homogeneous Markoff process by which P{t, x,y),t > 0, is the transition prob¬ 
ability that a point x is transferred to y after the elapse of <-unit time. We 
assume that P{t, x, y) is a Baire function in (<, x, y) and continuous in t, then P 
satisfies Smoluchouski’s equation 

(1.1) P(< -1- s, X, y) = I Pif, X, e)P(s, z, y) dz {t, s > 0), 

dz being the (?-invariant measure \/g{x)dx^ da:* • • • dx”, gix) = det(j7,•,(*))» 

(1.2) Pit, X, y) ^ 0, 

(1.3) f P(t, X, y) dy = 1. 

Ja 

The spatial homogeneity of the transition process may be defined by 
tl.4) Pit, Tx, Ty) = Pit, X, y) for T t G. 

The “continuity” of the transition process may be defined, following after A. 
Kolmogoroff and W. Feller,* as follows. Let Li(<S) be the function space of 
integrable (with respect to dx) functions/(a:) on S, then, for those/(x) which are 
dense in LiiS), 

^M^^A-fit,x), it^O); 

(1.5) 

fit, x) = jjiy)Pit, y, x) dy, it > 0), fiO, x) = fix), 
where, with non-negative 6"(x) 

(1.6) (AfiU) = (- Vs® a'WM) 

1 A. Kolmogoroff, “Zur Theorie der stetigen zufftlligen Prozesse,” Math. Annalen, Vol. 
108 (1933); W. Feller, “Zur Theorie der stochastischen Prozesse,” Math. Annalen^ VoL 113 
(1937). 


292 



BBOWNIAN MOTION 


293 


The temporally and spatially homogeneous “continuous” Markoff process 
may, if it exists, be called a Brownian motion on the homogeneous space S. 
The purpose of the present note is to show that, under some derivability hypoth¬ 
esis concerning a'{x) andb”(x), there exists one and (essentially) only one Brown¬ 
ian motion on the surface of the 3-sphere S’. 

I here express my hearty thanks to Dr. Kiyosi It6 who proposed to me the 
problem and discussed and much improved the manuscript. 


2. The defining equation for the Brownian motion. The spatial homogeneity 
(1.4) is equivalent to the fact that A is commutative with every operator f de¬ 
fined by 

(2.1) {ff)ix)=f(Tx), TtG, 

because we have 


f f(y)P(t, y, Tx) dy = f f(Ty)Pil, Ty, Tx) dTy = f S{Ty)P{t, y, x) dy. 
Ja Ja Ja 

The condition (2.1) is equivalent to 

(2.2) XA = .4Z for any infinitesimal operator X = {*(x) ^ 


induced on S by the infinitesimal operator of the Lie group G. Thus, assuming 
the derivability of o*(x) and b'^(x) of necessary orders, we obtain from (2.2) the 
conditions: 


(2.3) 


(2.4) 


_ 1 _ 


tVrl — ( ^ &G\x) \ ^ „ 

^ ^ ^ ax* \\/?(») ) ’ 

(g‘W - 

±{-L- 

ax‘ w g{x) 


df{x) 


H\x) -1- 6”(x) = f(x) 


aV(x) 


dx'dx’ 



- G‘(») + ^ (VjW !•■' (*)), 

.5) 

Now for the surface of the 3-sphere S*, 

ds’ = da* -1- sin*a-d^’, g{6, </>) = sin’a. 


and the infinitesimal operators 


X. = sin — -f 


V V 

Xy — cos Ip — 


COS 0 cos Ip a 
sin 0 dip’ 

cos a sin y a 
sin a dip’ 



294 


KdSAKU TOSIDA 


X. = 


d<(> 


respectively correspond to the rotations about the x-, y- and 2 -axis. 

From (2.5) we see that, by taking X = X,, 

(2.6) <f>) is independent of ip. 

By taking X = X, in (2.4) we see that is independent of <p. Hence, by (2.6), 

(2.7) a\d, ip) is independent of <p. 

Thus, by taking fc = 1, X = X* we obtain from (2.4), 

1 Tr2//.\_ 1.22//,\ ^ f 1 




and thus 

(2.8) H\e) = 0, -f ~ H\d)^ = 0. 

Hence, by taking k = 2, X = X, or X = X,, we obtain from (2.4) 

-H\e) cos <p , ^11 cos e cos <p - b^\e) coa g cos y _ ^ 

sin* 0 ^ ^ ®'"* ® ^ a '' oin a ’ 


sm*fi 


sin* 0 sin* 0 

From these two equations we obtain 


sin 0 

- 2b^\e) -f 2b^i0) ^ + 6“«» 

ain* A ' ain* A ' Sm* 0 


sin 0 

, 22 /-V cos 0 sin <p _ 


sin 0 


= 0 . 


(2.9) 




By taking i = 2, k = 1, X = X,, we obtain from (2.5), (2.9) 
, 22 /y.\ I i.u/.i\ d I cos 0 cos £\ 

and hence 

( 2 . 10 ) 


^(0) = 

sm*9 


Similarly by taking i = 1, A: = 1, X = X, we obtain from (2.5) 

db^\0) 


6“(tf) cos ip -f 6“(tf) cos ^ = sin ^' 


d0 


and hence by (2.9), (2.10) 


( 2 . 11 ) 


6“(e) = constant C, 6“(0) = . 

sin' 0 


Thus we obtain from (2.4) 



BROWNIAN MOTION 


293 


H^i$) = —a‘(tf) sin + 2C cos $, H®(e) = — sin d-a^(d) 

and thus, by (2.8), 

(2.12) a\e) = 0. 

Substituting (2.11) in (2.9) we obtain 


(2.13) 


a\d) 


C cos 9 
sin d 


Therefore since 6“(fl) and 6”(tf) are non-negative, A is (essentially) equal to 
the Laplace operator 


(2.14) 


A 


—- rr sin d — -H 
sin 6 66 60 


1 ^ 
sin* 6 6ip^ * 


Thus we may obtain P(t, x, y) by integrating the equation 


(2.15) 

and by putting 

(2.16) 


0, <p) = fit, x) = [ Siy)Pit, y, x) dy. 

Ja* 


3. Integration of the equation (2.16)-(2.16). Consider the Laplacian (real) 
spherical harmonics 

(3.1) Yi”\6, <(>) = Yi”'\x), (-* ^ m g fc; fc = 0, 1, • • •)■ 

They constitute an orthonormal fimction system complete for continuous 
functions on /S', and we have 

(3.2) A- yr’(0, <p) =» -kik -h l)Fr (0, <p)- 
Since, as is well-known, 

(3.3) Yi”\T~^x) = i: u<£l,iT)Yi''\x) 

by an irreducible orthogonal representation (unmiT)) of the rotation group G, 
we have 

(3.4) max | Yi”\x) |* ^ (2fc + 1) min X) I Yi’'\x) 1*, 

» X n"^k 

by applying the Schwarz inequality and the transitivity of the group G on 
/S'. The right hand member satisfies, by the orthonormality 

(3.5) (2fc + l)V(area of S'). 

Therefore the double series (for t > 0) 



296 


ABTEH DVOBETZKY 


(3.6) 


t, i: exp i-k(k + i)t)Yr\0, v') 

Jfe»0 m**—A: 


is absolutely and uniformly convergent on We will show that this P is 
the required (unique) Brownian motion on S®. 

The proof may be given in three steps, i) We see by (3.2) and (3.6), that 

f f(y)P(t, y, x) dx satisfies (2.15) if 

J 89 

fix) ~ £ i: di”'^Yir\x), £ £ exp i-kik + mkik + D di”'^Yi”\x) 

Jb—O wi"—A; A—0 tn<"^k 


are both absolutely and uniformly convergent. By the completeness of { (x )}, 
such f(x) are dense in Li(S). 

ii) Because of (3.3) we see that (3.6) satisfies the spacial homogeneity (1.4). 

iii) (1.3) is obvious by the orthonormality of {Fjfc”*^(a:)} and the constancy 
on S® of Yo^\x), Next, for the solution fit, x) of (2.15)-(2.16), let f{x) = 
/(O, x) be non-negative on then x) = exp(— e x), (e > 0), satisfies 

= \-g,it, x) - x), (t > 0), 

x) = fix) ^ 0 (on S*). 

Thus g,it, x) ^ 0 on S’, since g,it, x) cannot have a negative minimum on the 
product space [<i, < 2 ] X S’, for any > <1 > 0. For at such minimizing point 
we must have 


dg, 

dt 


= 0 , 


^ = 0 
de ’ 


1^ = 0, 

dip 


de^ 


^0, 


d(p^ 


^ 0 . 


Therefore, since € > 0, <2 > > 0 were arbitrary, we conclude that f(t, x) ^ 

0 on s’ for t > 0 if /(x) =0 on S’. This proves (1.2). The same argument 
simultaneously shows us that the solution P of (2.15)-(2.16) and (1.2)-(1.3) is 
unique. 


ON THE STRONG STABILITY OF A SEQUENCE OF EVENTS 

By Aeyeh Dvobetzky 

Hebrew University, Jerusalem, and Institute for Advanced Study 

1. Summary. M. Lofeve [3] has found conditions under which a sequence of 
events which may be interdependent in an arbitrary manner is strongly stable. 
In this note it is established that considerably weaker conditions imply the 
strong stability. 


A^,A 


i > 


» t 


2. Introduction. Let 

( 1 ) 



STBONQ STABILITY OF A SEQUENCE 


297 


be a sequence of events, which may depend on each other in any way whatsoever, 
defined on the same set of trials. 

Let Rn be the repetition function of (1), i.e. Rn is the number of those among the 
first n events: Ai, A 2 , • • • , An which were realized, and put/„ = Rn/n. The 
random variable/* is called the frequency function of (1). 

Denoting by J5{x} = x the expected value of x it is evident that 


Rn = £:{fl*l = E Pr {Ai), fn = E[Sn\ = - £:{«*}. 

,-i n 

Following Lofeve [3, p. 252] we say that (1) is strongly stable if the sequence 
<Pn = /» — fn (n = 1, 2, • • •) is strongly stable in the usual Kolmogoroff sense 
[1, p. 58], i.e. if 

(2) lim Pr (sup | ^>, | > e) = 0 

n-*oo F>n 

for every e > 0. 

Putting* 


fin - E Pr 7n — — 7 - 7 .. 

tl t»ii TlyTl 1) 

and introducing the abbreviation^ 


E Pr (A,Ap) 


“* Tn i 


Lofeve's result [3, pp. 257-9] is the following: 

If nbn is bounded then (1) ts strongly stable. 

This, even when specialized to sequences of independent events, includes the 
Bernoulli and Poisson cases. 

Here the following stronger result will be established. 

Theorem. If S hn/n is convergent then (1) fs strongly stable. 

In particular, if for some € > 0 the sequence n‘5„ is bounded then (1) is strongly 
stable. 


3. A lemma. The new tool here used is the following simple result on series of 
positive terms. 

Lemma. Let an > 0/orn = 1, 2, • • • and 


(3) 



he convergent. Then there exists a sequence ni of integers satisfying 
(4) 0 < n<+i — ni = o(ni) (i 

and such that the series S*-! am is convergent. 


1 denotes the event: both and Ap, 

* Our j3n , T» and 6n correspond to Lo6ve's pi(n), p 2 (n) and d\ respectively. 



298 


ARYEH DVOEETZKY 


Proof. Since (3) is convergent it is well known* that there exists a sequence 
of numbers l„(n = 1,2, • • •) satisfying 

(5) l„+i > In, lim 1„ = w 
having the property that 

(6) iln^<^. 

n-1 n 

We define inductively a sequence of integers m(i) through 

(7) m(l) = 1, m(i + 1) = m(t) + 1 + 

L‘»>(0 

the square brackets denoting the integral part. Clearly 

(8) 0 < m (i + 1) — m{i) = o{m{i)). 


Now for every i we choose ni so that 


m(«) < n,- < m(j, + 1) and = min a,. 

These rii satisfy the requirements of the lemma. 

Indeed, (4) holds in virtue of (8) while applying (5) and (7) we obtain 


in(»+l)--l 

» - i ^ S(r+T) - soiTT) ■ 


m{i) 


Since S converges by (6) it follows from the preceding inequality and (8) that 
S < 00 as required. 

Corollary. The conclusion of the lemma remains valid if the condition an > 0 
is dropped provided (3) is absolutely convergent. 


4. Proof of the theorem. An easy calculation [3, p. 253] gives 
a\ = E{{fn - /„)*) =Sn + . 

72r 


Since both and 7 „ are between zero and one we have 


1 

n 


< 


Sn 


< 


1 

n’ 


Therefore it follows from the assumption of the theorem that S (a*/w) is con¬ 
vergent. Hence by the lemma there exists a sequence of integers satisfpng 
(4) and such that S v* ( converges. 


* Take e.g. h •= (S,>n (cf. [2, p. 299]). 



8TBONO STABILITY OF A BBQUBNCB 


299 


Applying Tchebytcheff's inequality to and adding for v >i 

we have for every « > 0 

(9) Pr(8upl,,n,|>«)<if:<r*«,. 

If n< < n < n<+i then 

^ n<.n - n< 
n< 

Denoting the last term of this inequality by « and putting — max, 2 i tp, we 
have from (9) 

Pr (sup I v>» I > « + 2e<) < -j £ <r\,. 

n^ni 6* F—1 

As ii —» 0 and the right hand term is the remainder of a convergent series, (2) 
follows and the theorem is proved. 

6 . Remarks. 1. The lemma used here can also be applied to the study of 
the order of magnitude of <pn iu the almost certain sense. 

2. If the terms of (3) are decreasing then the existence of a convergent sub¬ 
series of S a„ satisfying (4) implies S“_i at* < <». But this is equivalent to the 
convergence of the series with monotone terms (3) (cf. e.g. [2, p. 130]). Hence 
in this case the convergence of (3) is necessary as well as sufficient for the validity 
of the lemma. It may be possible to use this remark in order to establish in 
some special cases, where the interdependence of the variables decreases steadily 
in a suitable sense, necessary and sufficient conditions for strong stability. 

3. The sequence of 3„ is of course, of very specialized structure. Thus, since 
the stability of (1) is equivalent [3, p. 255] to 5„ —0 and is implied by strong 
stability, it follows that 6„ —» 0 whenever S (5n/«) is convergent. 

Added in proof: Since this paper was submitted I heard from Professor M. 
Lohve that he has independently obtained the theorem of section 2 by another 
method. 




n< I 


n 


Rnj 

ni 


REFERENCES 

[1] A. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitrechnung, Ergeb. d. Math. Vol. 2, 

no. 3, Springer, Berlin, 1933. 

[2] K. Knopp, Theory and Applications of Infinite Series, Blackie, London and Glasgow, 

1928. 

[31 M. Lo^vb, “fitude asymptotique des sommes de variables al^atoires li4es,” Jour, de 
Math, pares et appL, Vol. 24 (1945), pp. 249-318. 



300 


K. S. BANBRJEE 


A NOTE ON WEIGHING DESIGN 

By K. S. Banerjee 
Pusa^ Bihar^ India 

1. Efficiency of weighing designs given by a three-fourth replicate. In the 
June issue of the AnnalSy Kempthome [1] approached the construction of the 
orthogonal matrix X through fractional replicates, the original treatment of 
which was given by Finney [2]. Reference has been made to the use of a three 
fourth replicate for weighing designs. Details for such designs have not been 
furnished as their efficiency is lower than for the designs given by the com¬ 
pletely orthogonal matrix X. In a three fourth replicate the treatment combina¬ 
tions have to be chosen in a particular manner for a comparatively easier 
analytical treatment both from the point of view of agrobiological experiments 
as well as weighing designs. The variance of each of the estimates in such a case 
will be <7V2"“^ As a matter of fact, in a weighing design given by a fractional 
replicate of the type of (2^ — l)/2^, (jS = 1, 2, • • • n), of 2” experiments, the 
estimate of the variance of each object is independent of the fraction used and 
is equal to o-V2”~\ the same as above. 

2. Construction of a three fourth replicate* Kempthome mentions that iu 
factorial design of fraction f could be taken to consist of a J replicate on tin- 
identity I = ABC and a quarter replicate based on the identity 

I ^ A BC ^ ABC. 

If the half replicate based on the identity / = ABC be taken to consist of all the 
treatments corresponding to the minus signs of the treatment contrast ABC [3], 
the additional quarter replicate can be chosen in two different ways. When 
however the treatments corresponding to the minus signs of both A and BC 
are kept, omitting the treatments corresponding to the plus signs of A and BC, 
the three fourth replicate so obtained will have certain advantages, which will 
not be available if the quarter replicate to be added is chosen to consist of tJje 
treatments corresponding to the plus signs of A and BC. 

3. Behavior of the contrasts in a three fourth replicate and the efficiency of 
the weighing designs. In general, if there are n treatments giving rise to T 
treatment combinations and if the defining contrasts be chosen as 

/ = ACD = BDE = ABCE, 

it will be necessary to omit the treatment combinations corresponding to the 
plus signs of both ACD and BDE, which will be 2"'”^ in number. In the three 
fourth replicate so obtained, 2" treatment effects (inclusive of the mean) will 
divide themselves into sets of 4 treatment contrasts each. One of the sets will 
be /, ACD, BDE and ABCE and any other set will be formed by multiplying 
any treatment contrast by the defining set namely, /, ACD, BDE and ABCE. 
Only three contrasts out of four in a set will be independent, so that only one of 



WEIGHING DESIGN 


301 


the contrasts, preferably the one of the highest order interaction may be kepi 
as an alias (in agrobiological experiments) of the remaining three and may 
therefore be omitted. Each of the four contrasts within a set will be orthogonal 
to each of the other contrasts in the remaining sets, but within a set the four 
contrasts will be non-orthogona,l to one another. Though non-orthogonal, the 
normal equations will be of the systematic type' and the matrix X'X, taking 
any three contrasts out of each set of four, will take the following form: 



a 

a 

0 

0 

0 

0 

0 

0 . . ." 

a 

X 

a 

0 

0 

0 

0 

0 

0 . . . 

a 

a 

X 

0 

0 

0 

0 

0 

0 . . . 

0 

0 

0 

X 

a 

a 

0 

0 

0 . . . 

0 

0 

0 

a 

X 

a 

0 

0 

0 . . . 

0 

0 

0 

a 

a 

X 

0 

0 

0 . . . 

0 

0 

0 

0 

0 

0 

X 

a 

a • * • 

0 

0 

0 

0 

0 

0 

a 

X 

a • • • 

0 

0 

0 

0 

0 

0 

a 

a 

X • • • 

_■ 



• 

• 

• 

• 

• 



vtiere the order of the matrix N = |2" is of the form 3t(t = 2""*) and x = 3. 2"“', 

= —^2* + J2"~^ = —2"”'. The value of the above determinant = (x — af‘ 
(. 1 . + 2a)‘ and that of the determinant suppressing the first row and the first 
-olumn — (x — (x + a)(x + 2a)‘“'. o” = (x + a)/(x — a)(x + 2o) = 

I/2'‘~‘, substituting for x and o. The variance of each estimate will therefore 

2 /c%n —1 
>e <7 /2 

4. General case. IVhen a fraction of the type a/2^ = (2^ — l)/2^ is used, 
the treatment combinations corresponding to the plus signs of the P independent 
contrasts is omitted. Out of each set of 2^ treatment contrasts, only a = 2^ — 1 
will de independent and the matrix will then take a form like that of (1), where 

= [(2^ - l)2"]/2'’ = 2”~\2^ - 1) and 
a = 2" + [(2^"' - l)2”]/2^ = -2"~^ 

«”■ = [x + (a - 2)o]/(a: - a) [x + (a - l)o] = (2-2”''')/2"2’‘-'’ = 1/2""'. 

i'he variance of each estimate = <rV2"~', the same as before. When a com¬ 
pletely orthogonalised matrix of the order (a2")/2^ = 2""^(2^ — 1) is available, 
the variance of an estimate will be (rV2""^(2^ — 1). The ratio of the two 
variances = 2"“'/(2" — 2""^) = 2^"'/(2^ — 1), which shows how the efficiency 
of the weighing design decreases with the increasing value of the fraction. 
When |3 = 1, i.e. in a half replicate, the efficiency is 100 percent. The value of 
tlie fraction is never less than i. 

^ The analysis of the data available from agrobiological experiments will not be cumber¬ 
some to a prohibitive extent as in many other experiments where non-orthogonality creeps 
in. The results of investigation in this direction have already been communicated for 
publication elsewhere. 








302 


K. 8. BANBBJBB 


6. Independence of the estimates given by Ljr in a biased spring balance. 
Eempthome mentions that although the optimum designs for the spring balance 
case suggested by Mood furnish somewhat smaller variance than what is given 
by fractional replicates, these designs have the disadvantage that the estimates 
are correlated, whereas the estimates furnished by fractional replicates are 
orthogonal. The designs furnished by fractional replicates take account of the 
bias and if the weighing operation corresponding to the bias is omitted (in case 
where the spring balance is free from bias), the resultant scheme will fail to give 
independent estimates and the variance factors will be of the same magnitude 
as in the optimum design Lh of Mood with the same number of weighings. 
Again, these optimum designs may also be made to furnish independent estimates 
when the designs are adjusted in the manner as suggested by Mood to suit a 
biased spring balance. 

It is true that the design matrix La given by 


X - 


"1 1 O' 
1 0 1 
0 1 1 


does not give independent estimates as such; but when it is assumed that the 
spring balance has a bias and the design matrix is modified as follows: 


( 2 ) 


10 0 0 
1110 
110 1 ’ 
10 11 


the estimates except that for the bias will be orthogonal to one another and the 
variance of the estimated weights will necessarily be larger in value. 

Before proving the general case, we notice that when — 1 is substituted for 0 
in (2) above, the resultant scheme will be an orthogonalised matrix. This is 
true not only in this particular instance but will hold good also in general. The 
constitution will be clear when the method of construction of Ls from 
is recalled. 

The distribution of ones in Ls gives a special type of symmetrical balanced 
incomplete block design, where r =» ft = ^(6 + 1) and X = i(^> + 1)> while the 
distribution of zeros gives the complementary design for which ro = r — 1, 
>a ft — 1 and Xo » X — 1. Therefore when a row of zeros and a column of 
ones (in that order) is added to Ls , the matrix X'X of the resultant scheme takes 
the following form: 


N \ r r r 

r r X X 

r X r X 


XXX 


( 3 ) 







WEIGHING DESIGN 


303 


Making use of the identities well known in the theory of balanced incomplete 
block designs and remembering the relationships, 2\ — r — k = ^(N + 1), 

(I) The value of the determinant of 

rX = (r - \n(N + l){r + UN - 1)} - r^N] = (r - X^r + X(N - 1)], 

(II) The value of the determinant suppressing the first row and the first 
column — (r — X)^''‘[r + X(N — 1)], 

(III) The value suppressing the second row and the second column 

= (r - X)‘'~\{N + l){r + X(iV - 2)j - r\N - 1)> 
= (r - X)''-nr + X(iV - 1)], 

(IV) The value suppressing the first row and the third column 

= (r - X)*'"*[r{r + X(iV - 2)j - rX(N - 1)] 

= r(r - X)^-‘, 

(V) The value suppressing the second row and third column 
= (r - xnx{N + 1) - A 


== 0 . 

Hence, the reciprocal matrix of X*X will be given by 


(4) 

[X'X]-^ = 

1 

-l/k 

-l/k 

-l/k 

2/k 

0 

-l/k ••• 

0 

2/k ••• 

-l/k' 

0 

0 






.-l/k 

0 

0 

2/k 


Let Y' denote the column matrix of the results of the weighings, yo,yi, ■ * * ,yy 
and B' the column matrix of the estimates of the weights bo ,bi ,••• btr. Then 
the estimates will be given by the equation 

= [X’Xf'^X'Y'. 

It is easy to see that all the rows except the first in are orthogonal to 

one another. To explain this, let us take the design given by (2). Here 

” 1111 " 

0 110 
0 1 0 1 • 

_0 0 1 1 _ 

Then [X'Xf'X' will be of the form 

" 1 0 0 0 ■ 

-\/k +1/A: +l/k -l/k 
-l/k +1/A: -l/k +l/k ' 

_-l/fc -l/k +l/fc +l/fc_ 

In all the rows excepting the first, for every 0 and +1 in X', there will re¬ 
spectively be a —l/k and a +l/k in [X'Xf'^X'. It has been mentioned before 








304 


K. S. BANERJEE 


that an orthogonal matrix is obtained when — 1 is substituted for every 0 in Z 
or X\ Hence, N rows (all except the first) of [X'Xr^X' will be orthogonal 
and these N rows will estimate the N weights in orthogonal linear combinations 
of 2 / 0 , • • • yjy. 

It has been mentioned before that the distribution of zeros in Ls gives the 
complementary design, for which ro = r— 1, 2iro = A;— 1 and Xo = X — 1. 
If to such a design, a row of ones and a column of ones (in that order) be added 
to suit the estimation of the weights in a biased spring balance, exactly a similar 
situation will be obtained and the estimates will be orthogonal. It can readily 
be seen that the design furnished by Yates to weigh seven light objects and a 
bias is an illustration of this kind. The scheme given by Yates is the comple¬ 
mentary design of L? with an additional row and a column of ones added to Lj . 

The sixteen combinations of ten objects, a, 6, c, d, e, /, g, h, k, I include 1, 
which corresponds to weighing with empty pans or, in other words, which is 
devoted to estimating the bias. When 1 is omitted, X'X will be of the form 


r 

X 

X 


X X 
r X 
X r 


X 

X 

X 


XXX r 


where r = 8 and X = 4. The above matrix X'X is obviously of the same form 
as given by Lu • 

By following exactly the same procedure as given above, it can easily be seen 
that when the weighing operation 1 is included in the weighing design, the 
solution of the normal equations will lead to independent estimates. The 
absence of each letter will be a 0 and the presence a + 1 in the design matrix 
and if — 1 is substituted for every zero, the resultant matrix will be orthogonal. 
In some cases, however, the number of letters in all the combinations will not be 
the same, i.e. k will not be constant. In such a situation, k in (4) will take the 
value of r or of 2X. 


REFERENCES 

[1] O. Kbmpthornb, “The factorial approach to the weighing problem/* Annals of Math, 

Stat., Vol. 19 (1948), pp. 238-245. 

[2] D. J. Finney, “The fractional replication of factorial arrangements,** Annals of 

Eugenics, Vol. 12 (1945), pp. 291-301. 

[3] F. Yates, Tech. Commun. Bur. Soil Sci. Harpenden no. 35 (1937), p. 11. 

[4] Harold Hotelling, “Some improvements in weighing and other experimental tech¬ 

niques,** Annals of Math. Stat., Vol. 13 (1944), pp. 297-306. 

[5] K. Kishen, “On the design of experiments for weighing,** Annals of Math. Stat., ^1.14 

(1945), pp. 294-301. / 

[6] A. M. Mood, “On Hotelling*s weighing problem,** Annals of Math. Stat., Vol. 17 il946), 

pp. 432-446. ^ 

[7] R. L. Plackbtt and J. P. Burman, “The design of optimum multifactorial experiment,** 

Biometrika, Vol. 33 (1946), pp. 305-325. ». 








CONTROL CHART 


305 


CONTROL CHART FOR LARGEST AND SMALLEST VALUES 

By John M. Howell 
Los Angeles City College 

1. Introduction. It may at times be desirable to use a control chart for 
largest and smallest values (L & S) in place of the conventional charts for 
averages and ranges (X & R), The chart for largest and smallest values has 
certain advantages: all information may be combined on one chart, computations 
are simple, and specifications may be placed on the chart. In this paper, 
constants for the use of this chart are developed and comparison is made with 
the average and range charts. 

2. Constants for determining limits. Let L and S denote the largest and 
smallest values, respectively, in a sample of n pieces, and let L and S denote the 
averages of these values for k samples. Then (L + S)/2 and (L — S)/d 2 are 
unbiased estimates of the population mean and standard deviation, respectively, 
in the case of a random sample from a normal population. The value of the 
constant c ?2 is given in [1] and repeated in table 1 for convenience. If we denote 
(L + S)/2 by M and (L — S) by R, control limits may be determined in terms of 
these statistics. 

In conformance with usual control chart practice, we will set the upper control 
limit at L + 3&l and the lower control limit at S — 3^5, where ol is an estimate 
of the standard deviation of the largest values in samples drawn from a normal 
population, and similarly for The results of Tippett [2] and Pearson [3] 
for E(R) of samples from a normal population were used to determine expected 
values of L and S: E{R) = d 2 cr. Here, R is the range of samples of size n: 
/Z = L — aS. But since E[{L + aS)/2] = a for a symmetrical distribution, then 
E{L) = a + d 2 cr /2 and E{S) = a — d 20 '/ 2 , where a and o are the mean and 
standard deviation of the normal population from which samples are draAvn. 

The probability element of the largest value [4] is given by: 

n[F(L)]’’-‘/(L) dL where f{x) = 1/V^ e' and F{x) = f* fiy) dy. 

J—oO 

f oo 

L^[F(L)f “^(L) dL. Integrals of this type, differing only 

oo 

by a constant factor have been evaluated by Hojo (5] and from his results dt was 
determined so that <tl = <ts = di<r. Values for dt for n — 2,5 ,10 are also given 
by Tippett [2]. “Three-sigma” control limits may then be given in the form: 
M ± AiR, where ^43 = 0.5 -f Sdi/dz. The expected value of the upper control 
limit will then be; E(UCL) = a -f Aicr, where At = (^ 2 / 2 ) + Mi. Values of 
these constants for various sample sizes are given in Table I. 

In practice, it might be desired, in the case of control charts for individual 
measurements or for L and S, to have E(UCL) = a -f- So-, and the lower control 
limit S 3 Tnriietrically placed with respect to the central line. In this case, the 
formula for the limits would be: ilf ± SR/di or ilf ± y/nAiR, where /1 2 = 



306 


JOHN M. HOWELL 


3/(d*\/n) is given in [1]. Since the efficiency of M decreases rapidly with 
increasing sample size [6], it would probably be better to use in place of 
M for determining the central line for a control chart when the sample size is 
greater than five. ^ is the “average of averages” as defined in [1]. 

The chart for largest and smallest values would then consist of a chart on 
which both the largest and smallest values are plotted, with the central line at M, 
and the limits as given a|||^e. 

3. Comparison of charts for a particular case. A comparison of the L & (Sf 
chart with the X chart for a particular case in which the sample size was three is 
given in Fig. 1. Measurements Avere the'shear strength of spotweld coupons of 

TABLE I 


Constants for largest and smallest value chart 


n 

* 


A.> 1 

A, 

A, 

n 

2 

1.128 

.825 

1.880 

2.72 

3.03 

2 

3 

1.693 

.748 

1.023 

1.82 

3.09 

3 

4 

2.059 

.709 

.729 ! 

1 

1..53 

3.15 

4 

5 

2.326 

.670 

1 

.577 

[ 1.36 

3.17 

5 

6 

2.534 

.648 

.483 

i 1.27 

3.21 

6 

7 

2.704 

.627 

.419 

1.20 

3.23 

7 

8 

2.847 

.614 

.373 

1.15 

3.26 

8 

9 

2.970 

.600 

.337 

i 1.10 

3.28 

9 

10 

3.076 

.588 

.308 

I 1.07 

3.30 

10 


aluminum in pounds. Since the range chart had no points above the “three- 
sigma” control limit and showed no other peculiarities, it has been omitted. 


4. General comparison of charts. We assume a mean of zero and a standard 
deviation of unity as a “given standard,” and then compute the probabilities 
when the true values are a and <r. The probability of a point being inside of 
“3-sigma” control limits on the range chart under these conditions is: 
Pi — Pr(i? < diDifff), where Dt is given in [1]. The probabilities for the 
range used here were found from the Pearson-Hartley tables [3]. The usual 
normality assumptions are made. 

The probability of a point being inside of “3-sigma” control limits on the 
average chart under the same conditions is: 


^/nl9 ((—8/V^—a) 


1 




Since Daly [7] has shown that the average and range of samples from a normal 




SHEAR STRENGTH OF SPOTWELD COUPON IN POUNDS 


CONTROL CHART 


307 


CHART FOR LARGEST AND SMALLEST VALUES 




Fig. 1 





308 


JOHN M. HOWELL 


TABLE II 


n 

a 

a 

Pi 

Pt 

PiPi 

Pt 

Ni 

Nt 

3 

0 

1.0 

.994 

.997 

.991 

.991 

510 

510 



1.2 

.973 

.988 

.961 

.963 

116 

122 



1.5 

.901 

.955 

.860 

.868 

31 

33 



2.0 

.721 

.866 

.624 

.645 

10 

11 

3 

0.5 

1.0 

.994 

.983 

.977 

.980 

198 

228 



1.2 

.973 

.935 

.935 

.939 

69 

74 



1.5 

.901 

.917 

.826 

.834 

25 

27 



2.0 

.721 

.830 

.598 

.694 

9 

13 

3 

1.0 

1.0 

.994 

.898 

.893 

.931 

41 

65 



1.2 

.973 

.855 

.832 

.860 

25 

31 



1.5 

.901 

.802 

.723 

.740 

15 

17 



2.0 

.721 

.746 

.538 

.550 

8 

8 

3 

2.0 

1.0 

.994 • 

.323 

.321 

.590 

5 

9 



1.2 

.973 

.352 

.342 

.510 

5 

7 



1.5 

.901 

.378 

.341 

.414 

5 

6 



2.0 

.721 

.408 

.294 

.321 

4 

5 

6 

0 

1.0 

.995 

.997 

.992 

.992 

570 

570 



1.2 

.969 

.988 

.957 

.957 

105 

105 



1.5 

.855 

.955 

.817 

.878 

23 

36 



2.0 

.588 

.866 

.509 

.515 

7 

8 




✓ 






5 

0.5 

1.0 

.995 

.970 

.965 

.980 

130 

227 



1.2 

.969 

.942 

.913 

.927 

51 

62 



1.5 

.855 

.891 

.762 

.791 

17 

20 



2.0 

.588 

.805 

.473 

.505 

7 

7 

5 

1.0 

1.0 

.995 

.776 

.722 

.923 

15 

58 



1.2 

.969 

.736 

.713 

.828 

14 

25 



1.5 

.855 

.695 

.594 

.661 

9 

12 



2.0 

.588 

.648 

.381 

.426 

5 

6 

5 

2.0 

1.0 

.995 

.071 

.071 

.512 

2 

7 



1.2 

.969 

.110 

.107 

.402 

3 

6 



1.5 

.855 

.164 

.140 

.286 

3 

4 



2.0 

.588 

.230 

.135 

.185 

3 

3 




SUFFICIENCY, TRUNCATION AND SELECTION 


309 


population are independent, the probability that a sample is within control 
limits on both charts is the product of the probabilities: P 1 P 2 . Thus the 
probability that a sample be outside of control limits on either chart is 1 — P 1 P 2 . 
The probability of the largest and smallest values both lying in the interval 

r .(c-o)/» -In 

from —c to c is: P 3 = Pr(—c < S, L < c) = / ip(t) dl . Values of 

L»'(—c—o)/<r J 

this expression with lower limit — <» are given in table XXI of [ 8 ] for sample of 
sizes 3, 5, and 10. For the purpose of comparing the charts, we choose c so that 
the probabilities of Type 1 errors areequal, thatis:! — P 1 P 2 = 1 — PjorPiP 2 = P* 
when the mean is zero and the standard deviation unity. Substituting in this 
equation and solving, we find: P(c) = 0.5 + 0.5 (.9973Pi)‘'", where P(x) = 

f <p(t) dt. For w = 3 , c = 2.99 and for » = 5, c = 3.15. 

Comparing P 1 P 2 with P 3 when the true values are a and o- will then show the 
relative power of the X & R charts and the L & S chart for detecting lack of 
control. 

Finally the charts are compared by finding the number (A^i for the X & P 
charts and Nz for the L & S chart) of samples which will detect lack of control 
with a .99 probability under the conditions given above. This is done by 
finding the smallest integer which satisfies the following inequalities: (PiP 2 )^‘ < 
.01 and Pz* < .01. As may be seen from table II, under most conditions, the 
L & S chart is nearly as good as the X & R charts for detecting lack of control. 

REFERENCES 

tl] American Standards Association, Control Chart Method of Controlling Quality during 
Production, Z1.3—1942. 

12] L. H. C. Tippett, “On the extreme individuals and the range of samples taken from a 

normal population,” Biomeirika, Vol. 18 (1925), pp. 364-387. 

13] E. S. Pearson, “The probability integral of the range in samples of n observations 

from a normal population,” Biometrika, Vol. 32 (1942), pp. 301-308. ^ 

[4] S. S. Wilks, Mathematical Statistics, Princeton University Press, 1943, p. 91. 

15] Hojo, “Distribution of median from a normal population,” Biometrika, Vol. 23 (1931), 

p. 315. 

16] W. A. Shewhart, Economic Control of Quality of Manufactured Product, D. Van Nos¬ 

trand Co., 1931, p. 282. 

17] J. F. Daly, “On the use of the sample range in an analogue of Student’s (-test,” Annals 

of Math. Stat., Vol. 17 (1946), pp. 71-74. 

18] Karl Pearson, Tables for Statisticians and Biometricians, Cambridge University 

Press, 1914. 


SUFFICIENCY, TRUNCATION AND SELECTION^ 

By John W. Tukey 
Princeton University 

1. Summaiy. The fact that the mean and variance were sufficient statistics 
for a univariate normal distribution truncated at a fixed point was known to 


^ Prepared in connection with work sponsored by the Office of Naval Research. 



310 


JOHN W. TUKEY 


Fislier by 1931 [2]. Hotelling [3] has recently observed the corresponding fact 
for the truncated multivariate normal distribution. 

It is the aim of this note to point out that these are special cases of a general 
result, namely: IJ a family of distributions admits a set of sufficient statistics^ then 
the family obtained by truncation to a fixed set, or by fixed selection, also admits the 
SAME set of sufficient statistics. 


2. Representation. The basic formal results about sets of sufficient statistics 
are due to Fisher [1], whose arguments, with obvious modifications, establish 
that families of distributions satisfying the usual conditions have sufficient 
statistics. The converse was established by Koopman [4] for a reasonably wide 
class of families. 

The usual condition can be easily handled and given wide application by 
representing the family of distributions in a form suggested to the author by 
Rubin, and ascribed by him to Cram4r, namely: 

dF(x I 6) = c(6)f(x 1 6) dy(x)^ 

where a; is a possibly multidimensional chance quantity (i.e. random variable), 
^ is a possibly multidimensional parameter, c(^) is a positive real function of & 
which serves to normalize the distribution, f(x [ S )—the relative probability 
density—is a non-negative real function of x and 0, and n{x) is a positive measure 
function. In this representation the natural and sufficient condition that 
{/ii(a;)} are a set of sufficient statistics for 6 is the existence of functions ai{0) 
such that (cf. Koopman [4]) 

(1) d log fix \0) ^ ^ a,mix). 

When is a vector, the derivative is to be interpreted as the gradient (a vector) 
and the a<(fl) are to be vector-valued functions of d. We notice that this condi¬ 
tion concerns only the relative density function. 


3. Proof of result. Suppose the family F{x | 9) is truncated onto a Borel set 
E, this means that 

■D ( • Ei 1 E>/ I M j * c) Pr {a; in JS? n £^ 1 1 7^(a: I tf)} 

P, |x m a I FI.X I «) trunoated to E] - g | ■ 


If (ptix) is the characteristic function of E, which is =1 for x in « and =0 
otherwise, and if 


He) = Pr{x in E\F(x\e)] = f dF(x 

Jb 


e), 


then the probability element of Fix | 9) truncated to E is 

ci9)/ki9)fix I ^)0*(x) dnix) = c'i9)fix | 9) dvix), 



ON A PROBABILITY DISTRIBUTION 


311 


where c'(^) = c{B)/h{d) and dv{x) = <I>e{x) dix{x). Truncation has not changed 
the relative density function, and the result follows from the form of (1). 

Next suppose that, instead of accepting values with probability one in E 
and with probability zero outside E, we select according to a fixed Borel function 
the chance of accepting a value x being (f>(x). The new family of distribu¬ 
tions has the same sufficient statistics for the same reason. 


REFERENCES 

[1] R. A. Fisher, “Thoorv of stniistical estimation,*’ Camb. Phil. Soc. Proc.^ V'ol. 22 
(1923-25), pp. 700-725. 

t2] R. A. Fisher, “The sampling error of estimated deviates together with other illustra¬ 
tions of the properties and applications of the integrals and derivatives of the 
normal error function,” Brit. Assn. Adv. Sci. Mathematical Tables^ Vol. 1, xxvi-xxxv. 
(3) H. Hotelling, “Abstracts of Madison Meeting,” Annals of Math. Stat.y Vol. 19 (1948). 
{4] B. O. Koopman, “On distributions admitting a sufficient statistic,” Trans. Amer. 
Math. Soc., Vol. 39, pp. 399-409. 


ON A PROBABILITY DISTRIBUTION 
By Max A. Woodbury 
University of Michigan 

!• Introduction, The problem treated is that of generalizing the Bernouilli 
distribution to the case where the probability of success is not constant from trial 
to trial but depends on the number of previous successes. The case where the 
probability of an event depends on the number of trials is easily handled and 
is not the case treated here. Several special cases of such a distribution have 
been worked out at one time or another. (E.g. C. C. Craig found the solution for 
one such special case and thus called the author’s attention to the problem.) 

The solution involves the Newton divided difference expansion of powers in a 
form which can be utilized for computation if the number of trials is not too 
large. In the case where the probabilities on a single trial are small an approxi¬ 
mation, (similar to that of the Poisson distribution to the Bernouilli distribution) 
can be found. 

Applications can obviously be made to urn schema in which black balls are 
replaced, but white balls are removed. Similarly, applications can be made to 
the distribution of the number of plants in a given area. 

2. Solution of the problem. Specifically the problem is as follows: “What is 
the probability that in n trials of an event it will occur x times presuming that 
the probability of the event on a given trial depends only on the number of 
previous successes?'' Denote by P(n, x) the probability of x successes in n 
trials and by p, the probability of the event after x previous successes. As 



312 


MAX A. WOODBUBT 


conventional denote q, = 1 — p, and one can fonnulate the following equation 
of partial differences: 

(1) P(n + 1, * + 1) = P»P(n, x) + g,+iP(n, x + 1). 

This equation is an obvious consequence of the statement that x + 1 successes 
in n + 1 trials can only occur if there are x successes in n trials and a success on 
the n + 1st or a; + 1 successes in n trials and failure on the n + 1 st. The 
boundary conditions appropriate are: 

( 2 ) P(n, x) = 0 for X < 0 , or X > » and P( 0 , 0 ) = 1 . 

It is convenient and appropriate to generalize ( 1 ) while retaining the boundary 
conditions ( 2 ). The equation ( 1 ) will be obtained from the following equation 
by setting 3 = 1 : 

(3) P(n + 1, X + 1) = (3 — 3 »)P(n, x) + 3 *+iP(w, x + 1). 

It will be noted for further reference at this point that: 

(4) Pin, 0) = 3 ? 
and: 

(5) Pin, n) = iq- qo)iq - 31 ) • • • (3 - 3n-i). 

This last suggests a change of variable of the form: 

( 6 ) Pin, x) = Fin, x)iq - qo)iq - qi) ••• iq- 3 ,). 

Upon substituting this expression in (3) one obtains a somewhat simpler equation 
with the same boundary conditions as ( 2 ). 

(7) Fin + 1, X + 1) = Fin, x) + 3 »+iP(n, x + 1). 

Using the generating function: 

( 8 ) Gix, ^) = 2 Pin, x)r 

one may obtain from (7), using the boundary conditions (2) the following 
ordinary linear difference equation: 

(9) Gix +!,{) = mx, {) + q^+lGix + 1, {)]. 

From (4) it is easily seen that: 

(10) GiO, {) = 1/[1 - 3 ofl, 
and hence that the solution of (9) is: 

(11) Gix, {) = r/[(l - 3o{)(l - qiO •••(!- 3^)]. 

This may be expanded in partial fractions and the result written: 

(12) Gix, 0 => ?*/((ff< — 3 o) • • • (?< ~ qi-Oiq* ~ ?<+i) “ • iqi ~ 3*)(i “ ?<€)]• 

<••0 



SAMPIiB SIZE DETEBmNATION 


313 


By means of the relation in (8) one deduces readily that: 

X 

(13) F ( n , x ) = 2 3 f /[( 9 .- - 3 o ) • • • ( 3 < - 3 <- i )( 3 < - 3 .+ i ) • * • (?< - 9x)l. 

•-0 

Jordan [1, p. 19, eq. (1)] shows this to be the xth Newton divided difference of 
where the expansion is in terms of iq — qa) (q — qx)t for x = 0, 1, • • • , n. 
The solution for (3) can now be ^vritten as: 

(14) P(n, x) = (g - go) • • • (? - g.-i)Fn(x) 
from which follows: 


(16) 


h P(n, x) 

x*0 


As remarked before, by setting g = 1 one obtains the solution of (1) subject to 
the boundary conditions (2). 

It is clear that when all the g< are equal that the Bemouilli distribution should 
come out as a special case. Since in this case the divided difference becomes the 
corresponding derivative divided by the appropriate factorial, one obtains: 


(16) 


P(n, x) = 


(1 - go)- (fg" 
x! dg* 




Upon reduction this yields the usual formula, but not in the usual way. 

By choosing p, = X,/n and allowing n to increase without limit one obtains 
an analogue of the Poisson distribution, viz: 


(17) P(x) = (-Xo)- • -(-X.) Zr'VUXo-X,)- • •(X,_i-X,)(X.+i-X.)- • •(X.-X.)l 

»-0 

which corresponds to the expansion of about Xo, Xi, X 2 , • • •, Xx, • • • when X = 0. 


REFERENCE 

[1] Charles Jordan, Calculus of Finite Differences, Chelsea Publishing Co., New York, 
2nd ed., 1947. 


A GRAPHICAL DETERMINATION OF SAMPLE SIZE FOR WILKS’ 

TOLERANCE LIMITS 

By Z. W. Bibnbaxtm and H. S. Zxjckebman 
Uhiversitff of Washington 

1. Sununary. To determine the smallest sample size for which the mini¬ 
mum and the maximum of a sample are the 100|3% distribution-free tolerance 
limits at the probability level «, one has to solve the equation 

(1) - (AT - 1)^"" = 1 - « 



314 


Z. W. BIBNBAVM AND H. S. ZUCKBRMAN 


given by S. S. Wilks [1]. A direct numerical solution of (1) by trial requires 
rather laborious tabulations. An approximate formula for the solution has 
been indicated by H. Scheff4 and J. W. Tukey [2], however an analytic proof for 
this approximation does not seem to be available. The present note describes 
a graph which makes it possible to solve (1) with sufficient accuracy for all 
practically useful values of 0 and c. 

2. Construction of the graph. Substituting in (1) 


we obtain 


J-x 

1 + ® = (1 - 

and 

(2) log (1 + a:) = - log 

To solve (2) graphically, one has to find the intersection of the curve 

(3) y = log (1 + x) 
with the line 

^ - log log 0 

To prepare a graph on which this can be done, one first plots (3) once for al 
(Figure 1, Curve C). Then one marks the points — log ^ on the p-axis 
and labels them with the values of e (Figure 1, Scale I); chooses a constant r > 0 
and marks the points r log - ^ -on the a;-axis (Figure 1, Scale II); chooses a con- 

0 1 

stant A: > 0, marks the points kr - -^log~ on the x-axis, draws vertical lines 

1 — 0 0 

through each of these points, and labels them with the values of 0 (Figure 1, 
Scale III); draws the line x = k (Figure 1, line L); marks the uniform Scale IV 
on the x-axis. 

The graph reproduced here has been prepared with r = 4, k = 5. It can 
easily be verified that the instructions on the graph lead to solutions x of (2) and 

i^»®j^of(l). 




1) connect c on Scale I and e on Scale II with a straight line; this line cuts vertical line marked /3 on Scale III at point P , 

2) locate on line L the point with the ordinate of P; call this point Q, 

3) connect 6 on Scale I with Q\ the connecting line cuts curve C at a point which has abscissa x on Scale IV; read off x, 

p 

4) compute N = z -- 



































316 


Z. W. BIBKBAtTM AND H. S. ZDCKERMAN 


3. Improvement by iterations. The graphical solution, usually accurate to 
two significant digits, may be improved easily by iterations. Replacing (2) 
by the equation 


(4) 


X = 



+ a:) + log 




one obtains iterations a;,+i = /(a:,) which, for .80 < e < .999 and .80 < /3 < .999' 
converge rapidly to the solution of (2). 

Example. For c = .99, j8 = .999, one finds graphically Xi = 6.6, and from 
(4) the iteration formula a:,+i = ^ which yields the values Xt = 


6.642, x» = 6.648, Xt = 6.649, Xs = 6.649. Rounding up we obtain the sample 
size AT = 6.649-999 = 6643. 

For c and /8 between .80 and .999 all iterations obtained from (4) are on the 
same side of the exact solution and converge to it monotonically. Thus, in our 
example, from xi < xj we conclude that xi as well as all further iterations are 
smaller than the exact solution. 


Refebences 

[1] S. S. Wilks, Mathematical Statiatica, Princeton University Press, 1943, p. 94. 

[2] H. ScHEFF^ AND J. W. Tuket, “A formula for sample sizes for population tolerance 

limits,” Annala of Math. Stat., Vol. 16 (1944), p. 217. 



ABSTRACTS OF PAPERS 

(Abstracts of papers presented at the New York meeting of the Institute on April 

1. Adjustment of an Inverse Matrix Corresponding to a Change in One Ele¬ 
ment of a Given Matrix. Jack Sherman and Winifred J. Morrison, The 
Texas Company Research Laboratories, Beacon, New York. 

If one element, Ors »in a square matrix A is changed by an amount Aors , all the elements 
hii in the inverse matrix B are generally changed. A simple equation has been derived by 
means of which the elements ha in the resulting inverse matrix B' can be computed directly 
in terms of Aa^a and the elements of B, The equation is 

, / I bsj bin Aors 

“ 1 + bs.Aa,s 

It follows that any given square matrix can be transformed into a singular matrix by 
increasing any one element in the transposed inverse matrix. 

2. The Distribution of the Number of Exceedances. E. J. Gumbel, New York 
and H. von Schelling, Naval Research Laboratory, New London, Conn. 

The probability for the mth observation in a sample of size n taken from a population 
with an unknown distribution of a continuous variate to be exceeded x times in N future 
trials is studied. The averages, moments, and the cumulative probability of the number 
of exceedances are calculated with the help of the hypergeometric series. The tolerance 
limits constructed by Wilks are special cases of the cumulative probability. The mean 
number of exceedances is the same as in Bernoulli's distribution. In some cases there are 
two modes, namely m — 1 and m — 2. If n == iV^, the most probable number of exceedances 
over the mth largest value is either m, or m — 1, and the median number of exceedances is 
equal to m — 1. In 50 % of all cases, the largest (smallest) of n past observations will not 
(aJways) be exceeded in n future observations. If n and N are both large and equal, the 
distribution of the number of exceedances over the median is normal whereas the distribu¬ 
tion of the extremes, similar to Poisson’s distribution, has a mean m, and a variance 2m. 
The variance of the number of exceedances is largest for the median, and smallest for the 
extremes of the previous sample. These distribution-free methods may be applied to 
meteorological phenomena, such as floods, droughts, extreme temperatures (the killing 
frost), largest precipitations, etc., and permit the forecasting of the number of cases sur¬ 
passing a given severity. 

3. Note on the Power Function of a Quality Control Chart. Leo A. Aroian, 
Hunter College, New York. 

The power function of a quality control chart is given for a sequence of N sample points 
in terms of a and 7, the probability of a Type I error and the power function respectively for 
a single sample point. Two different models are considered and the generalization to two 
quality control charts is indicated. 

4. Tests Between Two Means or Regression Coeflflcients When Observations 
are of Unequal Precision. Uttam Chand, University of North Carolina, 
Chapel Hill. 

Relative merits of different tests available for testing two means or two regression coef¬ 
ficients in relation to asymmetric and symmetric aspects of Student’s hypothesis in case 
of unequal population variances have been reconsidered. In this connection the distribu- 

317 



318 


ABSTRACTS OF PAPERS 


tion of a certain quantity tk where k is some inexact value of the unknown ratio of variances 
has been obtained. The hypothesis of the equality of two linear regression functions in 
case of unequal residual variances has also been considered. 

5. Functional Expansions. Eugene W. Pike, Boston, Massachusetts. 

This paper calls attention to a new type of estimation problem, arising both in the inter¬ 
pretation of experimental data from complex experiments, and in the design of analogue 
computers for functions of several independent variables. 

It has long been known, though not widely recognized, that the partial sums of rows and 
columns arising in the bivariate analysis of variance represent the least squares fit of a 
functional form [/(«) -f (/(y)] to a tabular function F{x, y) of two independent variables, for 
example. More recently, several people have realized gradually that independent causes 
may combine in much more complicated ways to produce a common effect, and that corre¬ 
spondingly more complicated functional combinations, such as [/(ic) 4- g{y) 4- h{x)^k{y)]^ 
can be fitted by least squares to tabular functions of x and y. 

Examples of such expansions, as applied both to the design of computers and to the 
analysis of experimental data, will be given. 

This presentation is based on work supported by the Air Materiel Command, USAF. 

6. The Geometric Range for Distributions of Cauchy’s Type. E. J. Gumbel, 
New York City, and R. D. Keeney, Metropolitan Life Insurance Comany, 
New York City. 

From each of N samples of large size n the largest and the smallest values Xn.w and Xu, 
(y * 1 , 2 , • • • N) are taken, where each X is measured from the central value of Nn observa¬ 
tions. The sample size must be so large that the probability of any extreme and —Zi., 
being negative may be neglected. The distribution of the geometric means p of the N pairs 
of extremes henceforth called geometric ranges, is derived under the assumption that the 
initial distribution is symmetric, unlimited and of the Cauchy type which implies that the 
moments of an order equal to, or larger than k{k> 0 ) diverge. Let u be the expected larg¬ 
est value. Then the probability density of obtained from a theorem of Elfving 

(Biometrika, Vol. 35 ) is ^kKoi^k) where Ko is a Bessel function. This permits calculation of 
all moments of • Methods arc given for estimating the parameters u and k. The distri¬ 
bution of the geometric ranges p is again a Bessel function. A probability paper is con¬ 
structed for testing the hypothesis that the initial distribution is of Cauchy^s type. A 
strict parallelism is established between the asymptotic distributions of the range for the 
exponential type, and of the geometric range for Cauchy’s type. This provides a criterion 
to which of the two types the initial distribution belongs. 

7. On Sums of Random Integers Reduced Modulo m. A. Dvoretzky, Insti¬ 
tute for Advanced Study, Princeton and J. Wolfowitz, Columbia Univer¬ 
sity, New York City. 

Let Xm , (n * 1, 2, • • • ) be an infinite sequence of independent, integral-valued, chance 
variables, and let m be any fixed integer greater than 1 , Put Sn *■ denote Sn 

reduced mod. m by Kn ; i.e., Yn is a random variable which assumes only the values j ■■ 
1,2, • • • , m with respective probabilities Pn(j) “ Prob {8« s y (mod. m) 1. Necessary and 
sufficient conditions are obtained for to be equidistributed in the limit, i.e., for lim^.., 

PnO) “ 0 “ I> 2, • • • , m.) Some easily applicable sufficient conditions are deduced 

m 

and the cases m «* 2, 3 . 4 are studied in detail. The rapidity with which Pn(j) is also 
studied 



ABSTRACTS OF PAPERS 


319 


8. The Corpuscle Problem: Estimating the Surface-Volume Ratio of a Cor¬ 
puscle of Arbitrary Shape. Jerome Cornfield, National Institutes of 
Health and Harold W, Chalklet, National Cancer Institute, Bethesda, 
Md. 

CoDsider a space containing F, a closed figure of arbitrary shape, volume V and surface 
area S, Let a line segment of length r be thrown in the space in such a fashion that we have 
uniform distribution of the probabilities that the end point P occupies any position in the 
space and that the other end point P' occupies any position on the surface of a sphere of 
radius r with center at P. Count the number of end points falling in F ( 0,1 or 2 for a single 
throw), call it the number of hits, and denote it by h. Count the number of times the line 
intersects the surface (0,1 or 2 times for a single throw for a non-reentrant figure, possibly 
more for a re-entrant one), call it the number of cuts and denote it by c. Then, it is proved 
that rE(h)/E(c) — iV/S, This result is intended to provide a theoretical basis for esti¬ 
mating the surface-volume ratio of physical objects of any shape. 

9. Generalized Hit Probabilities with a Gaussian Target. D. A. S. Fraser, 
Princeton University. 

In the Supplement to the Journal of the Royal Statistical Society, Vol. 8 ( 1946 ), L. B. C. 
Cunningham and W. R. B. Hynd proposed a problem and gave an approximate solution cov¬ 
ering a partial range of parameter values: to find the probability that a moving target will 
survive a burst of rounds from a rapid-firing gun, account being taken of correlation 
between the different points of aim. 

Generalizing from the case of a two dimensional target to dimensions, this paper 
gives the probability for 0,1,2, • • • n hits, under the following assumptions: the “n** points 
of aim have a Multivariate Gaussian Distribution, the dispersion error has a Gaussian Dis¬ 
tribution, and the target is a Gaussian Diffuse Target, that is, the probability of a hit on a 
particular round as a function of the coordinates of the shell has the form of *'a constant 
times a Gaussian probability density function.” 

Limiting distributions are obtained as n —► <», subject to a variety of limiting conditions. 

Numerical values for the probability of at least one hit are plotted when n = 5 , for a 
range of values, relative to the target size, of dispersion and aiming errors. 

10. A New Continuous Sampling Inspection Plan Based on an Analysis of Costs* 
F. E. Satterthwaite, General Electric Company, Bridgeport, Connecticut. 

Inspection, like all other industrial operations, must be run to produce the most return 
for the lowest cost. The costs include overhead and running inspection costs; complaint 
costs; rework and scrap costs; and the costs of unnecessary process rejections. Also one 
must consider the frequencies of occurrence of these costs. These include the process aver¬ 
age percent defective; the probability of occurrence of a complaint; and the frequency of 
occurrence of quality deteriorations. 

For continuous inspection, the percentage of the product to be inspected has a very 
simple formula: P » \/ SC/HM, where S is the sensitivity of the sampling plan used, C is 
the complaint cost, H is the effective inspection cost, and l/M is the quality deterioration 
rate. 

It was also necessary to develop a new continuous sampling inspection plan which would 
be efficient over the enHre range of continuous sampling applications. The plan presented 
is a sequential plan which, with suitable attention to details, is easily applied on the shop 
floor. The Dodge Plan is a special case and is efficient only in a small percentage of appli¬ 
cations. 



320 


ABSTRACTS OF PAPERS 


11. On the Levels of Significance of the F and Beta Distributions. Leo A. 
Aroian, Hunter College, New York. 

Two formulas are given for the determinations of the levels of significance of the F and 
Beta distributions. In the case of the F distribution a previous set of formulas (Biometrika, 
Vol. 34, pp. 359-360) is modified to give 3 significant figure accuracy, ni , n 2 ^ 24. The set 
for the Beta distribution is of Cornish-Fisher type, p, g ^ 6. The advantage of these over 
Paulson’s F formula and Carter’s z formula are the avoidance of the solution of a quadratic 
in the case of Paulson’s formula, and the avoidance of the exponential tables in the case of 
Carter’s z formula. A short numerical table compares the three methods for selected values 
of ni and ni . 


12. Certain Statistics for Samples of 3 From a Rectangular Population, Julius 
Lieblein, National Bureau of Standards. 


A continuation of a study presented at the Madison meeting of the Institute of Mathe¬ 
matical Statistics last September. (For abstract see Annals of Math. Stat.y December 1948, 
p. 595.) The previous paper derived properties of the statistics 


yi 




2/8 


-I- 

2 


where Xi , X 2 , Xt are the observations, ordered by increasing size, in an independent random 
sample of three observations from a normal population, and x' and x' ^ x'’, are the two 
closest of the three. In the present paper distributions (joint as well as simple) are obtained 
for the above three statistics and also for x'", the remaining observation not included in 
the closest pair, for samples of 3 from a rectangular population, and a theorem is proved 
concerning the distribution of yi for a wide class of continuous populations. 


13. The Choice of Lot Inspection Plans of the Basis of Cost. F. E. Satter- 
• thwaite, and Burton Grad, General Electric Company, Bridgeport, Con¬ 
necticut. 

An extension of the first paper to single sampling inspection plans. The important con¬ 
cepts involved are the break-even quality level, the operating ratio, and the weighted prior 
odds that a lot is a good lot. Charts are being prepared which can be entered with simple 
functions of the costs and which give directly the sample size and acceptance number for 
the most efficient single sampling inspection plan. 

It appears promising that the method can be extended to double and sequential sampling 
plans. This is imperative because of the large portion of the time that *‘no-inspection” is 
the most efficient single sampling plan. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Enrique Loizelier Blanco, Professor of Statistics in the University of Madrid, 
has just finished the first year of experimentation in Quality Control Methods 
in different plants. The interest for these new statistical applications started 
in Spain during 1946 and have increased rapidly since then, especially this year 
after consecutive bimonthly intensive courses which Professor Blanco has been 
teaching. 

Mr. Osmer Carpenter, formerly an Instructor in the Department of Statistics 
and Mathematics at Iowa State College is now' doing statistical work for Carbide 
and Carbon Chemical Corp., Oak Ridge, Tennessee. 

Dr. K. L. Chung, formerly of Princeton University, has been appointed to an 
assistant professorship at Cornell University. 

Dr. Clyde H. Coombs, Associate Professor of Psychology and Chief of Re¬ 
search Division, Bureau of Psychological Services at the University of Michigan, 
is on leave of absence for the academic year to work at Harvard University on 
problems of scaling. 

Dr. Meyer A. Girshick, formerly with the Douglas Aircraft Co., Santa Monica, 
California, has accepted a professorship in the Department of Statistics, Stanford 
University, Stanford, California. 

Dr. M. J. Gottlieb, who has been with the Institute for Advanced Study at 
Princeton, has been appointed to an assistant professorship at the Newark 
College of Rutgers University. 

Associate Professor E. H. C. Hildebrandt of Northwestern University has 
been elected President of the National Council of Teachers of Mathematics. 
He is also National Secretary-Treasurer of Pi Mu Epsilon and Secretary of the 
Mathematics Section of the Central Association of Science and Mathematics 
Teachers. 

Dr. C. A. Hollingsworth, formerly with the Acetate Section of the DuPont 
Company, is now an instructor in the Department of Chemistry, University of 
Pittsburgh. 

Professor William G. Madow, who has been with the Institute of Statistics 
at the University of North Carolina, has been appointed Professor of Statistics 
at the University of Illinois. 

Dr. Zenon Szatrowski, formerly teaching in the Economics Department of 
Northwestern University, has accepted an associate professorship in the Depart¬ 
ment of Economics, University of Oregon, Eugene, Oregon. 

Mr. Eric Weyl has resigned ^ position as staff engineer in the Chicopee Manu¬ 
facturing Corporation and is now conducting his own business as a textile en¬ 
gineering consultant in Manchester, New Hampshire. 

321 



322 


NEWS AND NOTICES 


New Members 

The following persons have been elected to membership in the Institute (December 1, 
1948 to February 28, 1949). 

Abruzzi, Adam, M.S. (Columbia Univ.) Student in engineering at Columbia University, 
W. 107th Streeiy Shanks Village^ New York. 

Agarwal, Satya P., M.A. (Agra Univ., India) Student at University of California, Inter¬ 
national Housey Berkeley California. 

Anderson, Robert W., M.A. (Columbia Univ.) Student at Columbia University, 

Road, Queens Village 9, New York. 

Bahadur, R. R., M.A. (Univ. of Delhi, India) Graduate Student at University of North 
Carolina, Chapel Hill, North Carolina. 

Blom, Gunnar, Fil.kand. (Stockholm) Olof Skotkonungs vag 8, Aspuddeuy Sweden. 
Burrows, Glenn L., M.A. (iMichigan State College) Research Associate, P.O. Box 168, 
Institute of Mathematical Statistics, Chapel Hill, North Carolina. 

Chapman, Carlos A., Jr., M.S. (Univ. of Michigan) Sales Statistician, Argus, Inc., Ann 
Arbor, Michigan, SSJf. W. Huron St., Ann Arbor, Mich. 

Chiang, Chin Long, M.A. (Univ. of Calif.) Student at the University of California, SS6-A 
Panoramic Way, Berkeley 4, California. 

Coggins, Paul B., M.S. (Univ. of Wisconsin) Graduate Teaching Assistant, University 
of Michigan, University Club, Madison 5, Wisconsin. 

Crapsey, Marcus T., A.B. (Univ. of Michigan) Graduate student at the University of 
Michigan, Monroe, Ann Arbor, Michigan, 

Coy, John W., M.A. (Univ. of New Mexico) Teaching Fellow, Department of Mathe¬ 
matics, University of Michigan, Whitewood, Ann Arbor, Michigan, 

Cutkosky, Richard E., Student at Carnegie Institute of Technology, Box Jftly Carnegie 
Institute of Technology, Pittsburgh, Pennsylvania, 

DelPriore, Francis R., B.A. (New York Univ.) Associate Statistician, U. S. Naval En¬ 
gineering Experiment Station, 2609~22nd. Street, N.E., Washington 18, D. C. 

Desind, Philip, M.S. (College of City of N. Y.) Statistician, Bureau of Ships, Navy 
Department, Washington, D. C., 7418 Georgia Ave., N.W., Washington, D, C, 

Ddtka, Solomon, M.A. (Columbia Univ.) Chief Statistician, % Elmo Roper, 30 Rocke¬ 
feller Plaza, New York City, New York. 

Dwass, Meyer, B.A. (George Washington Univ.) Graduate student at Columbia Uni¬ 
versity, Apt. SAy 609 W, 115 St.y New York, New York, 

Eastman, Walter F., A.B. (Harvard) Central Technical Department, The American 
Brass Co., Waterbury, Connecticut. 

Eisenpress, Harry, B.A. (College of City of N. Y.) National Bureau of Economic Re¬ 
search, 1819 Broadway, New York 23, New York, 29S5 Ocean Parkway, Brooklyn 24, 
New York. 

Fellows, Clifford Martin, B.S. (Boston Univ.) Assistant Instructor, Boston University, 
Bureau of Research and Statistics, 685 Commonwealth Avenue, Boston 15, Massachusetts, 
Gowen, John W., Ph.D. (Columbia Univ.) Professor of Genetics, Genetics Department, 
Iowa State College, 2014 Kildee, Ames, Iowa, 

Greenwood, Robert E., Ph.D. (Princeton Univ.) Assistant Professor of Applied Mathe¬ 
matics, University of Texas, 1704 Windsor Road, Austin, Texas. 

Hald, Anders, Ph.D. (Univ. of Copenhagen) Professor of Statistics, University of Copen¬ 
hagen, Emdrupvenge 94^ Copenhagen 0, Denmark, 

Helms, William R., Student at Ohio State University, Stadium Club, Ohio State University, 
Columbus 10, Ohio, 

Hemphill, F. M., M.S.Ph. (Univ. of Michigan) Major, U. S. Public Health Service, School 
of Public Health, University of Michigan, Ann Arbor, Michigan. 



NEWS AND NOTICES 


32a 


Himes, Harold W., B.S. (George Pepperdine College, Los Angeles) Statistician, Test 
Design and Anal3n3is Section, U.C.D.W.R., U. S. Navy Electronics Laboratory, San 
Diego 52, California. 

Hutchinson, L. Charles, Ph.D. (Mass. Institute of Tech.) Associate Professor of Mathe¬ 
matics, Polytechnic Institute of Brooklyn, Brooklyn, New York. 

Klahr, Carl N., M.S. (Carnegie Institute of Tech.) Student, Atomic Energy Commission 
Fellow, Carnegie Institute of Technology, 5557 Phillips Avenue, Pittsburgh 17, Penn¬ 
sylvania. 

Kraemer, Herbert F., B.S. (Univ. of Delaware) Statistical Engineer, Technical Super¬ 
visor, Commercial Solvents Corporation, Terre Haute, Indiana, 16H South 7th St., 
Terre Haute, Indiana. 

Kuebler, Roy R., Jr., A.M. (Univ. of Pennsylvania) Associate Professor of Mathematics, 
Dickinson College, Carlisle, Pennsylvania. 

Lafontant, Herne £., M.S. (Atlantic Univ.) Student at the University of Michigan, 
615 Monroe, Ann Arbor, Michigan. 

Lai, Dip Naravan, Ph.D. (Edinburgh Univ.) Lecturer in Mathematics, Patna University, 
New Dak Bungalow Road, Patna, Bihar, India. 

Liserre, Guido Orlando G., Profesor de Estadistica, Mendoza 26JiO, Rosario, R., Argentina. 

Matson, J. H., B.A. (Univ. of Wisconsin) Statistician, Baker Manufacturing Company, 
Evansville, Wisconsin. 

Monsch, Henry D., B.S. (Missouri School of Mines & Metallurgy, Holla) Metallurgist, 
Aluminum Company of America, Fabricating Division, Alcoa, Tennessee, 6607 Lake 
Shore Drive, Knoxville, Tennessee. 

Moore, Lucius T., Ph.D. (John Hopkins Univ.) Associate Professor, Department of 
Mathematics, Brooklyn College, 205 Hicks Street, Brooklyn, New York. 

Noack, Albert, Ph.D. (Kiel, Germany) Privatdozent, Studienrat, (24a) Hamburg- 
Lokstedt II, Tibarg 26, Germany. 

Patton, Robert E., A.B. (N. Y. State Teachers College, Albany) Graduate student at the 
University of Michigan, 622 Linden St., Ann Arbor, Michigan. 

Potter, Muriel, Ph.D. (Columbia Univ.) Instructor in Psychological Foundations, Edu¬ 
cational Research and Reading Supervisor, Teachers College, Columbia University, 
414 Riverside Drive, New York 26, New York. 

Putz, Robert R., B.A. (Univ. of Minnesota) Teaching Assistant, Department of Mathe¬ 
matics, University of California, 16S1 Cornell Avenue, Berkeley 2, California. 

Ratoosh, Philbum, M.A. (Columbia Univ.) Assistant in Psychology, Department of 
Psychology, Columbia University, New York 27, New York. 

Richardson, Wyman, Jr., S.B. (Harvard) Graduate student at the University of North 
Carolina, 208-B, Chapel Hill, North Carolina. 

Rosenbaum, Sidney, M.A. (Cambridge) Scientific Officer, Ministry of Works, 31, Multon 
House, Shore Place, London E.g., England. 

Savage, I. Richard, M.S. (Univ. of Michigan) Student at Columbia University, 1414 
John Jay Hall, New York 27, New York. 

Sheerin, Gail, A.B. (Univ. of Rochester) Statistical Technician, A.E.C. Project, Uni¬ 
versity of Rochester, 1091 Highland Avenue, Rochester, New York. 

Siegert, Arnold J. F., Ph.D. (Leipzig, Germany) Professor of Physics, Department of 
Physics, Northwestern University, Evanston, Illinois. 

Simpson, Paul B., Ph.D. (Cornell Univ.) Assistant Professor of Economics, Department 
of Economics, Stanford University, California. 

Solem, Anson D., M.S. (Harvard Univ.) Chief of Fragmentation Section, Naval Ordnance 
Laboratory, White Oak, Maryland, 121 Galveston St., S.W., Washington 20, D. C. 

Sorensen, Frederick A., B.S. (Carnegie Institute of Tech.) Teaching Assistant in Mathe¬ 
matics, Carnegie Institute of Technology, 1204 East End Avenue, Pittsburgh 18, Penn¬ 
sylvania. 



324 


NEWS AND NOTICES 


Steel, Robert G. C., M.A. (Acadia Univ., Canada) Instructor and Research Associate> 
Statistical Laboratory, Iowa State College, Ames, Iowa. 

Taylor, Francis B., A.M. (Columbia Univ.) Instructor in Mathematics, Manhattan 
College, New York and Graduate student at Columbia University, S^5 E, 19SSi,, Bronx 
New York. 

Terrell, James R., A.B. (Univ. of Michigan) Statistical Clerk, Research Center for 
Group Dynamics, P.O. Box 367, Ann Arbor, Michigan. 

Tick, Leo J., B.S. (Iowa State College) Research Graduate Assistant, Statistical Labora¬ 
tory, Iowa State College, Ames, Iowa. 

Tyler, Sylvanus A., S.M. (Univ. of Chicago) Associate Mathematician (Biometrics), 
Argonne National Laboratory, P.O. Box 5207, 9059 So. Stewart Avenue, Chicago 30, 
Illinois. 

Tysver, Joseph B., M.A. (Washington State College) Teaching Fellow, University of 
Michigan, HOJh Erving Court, Willow Run Village, Michigan. 

Umarji, Raghavendra R., A.M. (Columbia Univ.) Lecturer in Mathematics, Bombay 
Educational Service, 509 John Jay Hall, Columbia University, New York 27, New York. 

Wilburn, A. J., A.B. (Howard Univ.) Statistician, Civil Aeronautics Board, Washington, 
D. C., 36-46th Place., N.E., Washington, D. C. 


Correction 

The information following Paul Koditschek’s name which appeared in the March issue 
of the Annals, page 149, should have appeared as follows: 

Kodltschek, Paul, LI. D. (Univ. of Vienna) Research Associate, Scientific Research Service, 
319 W. 13th Street, New York 11 New York. 

(It was implied in the original notice that Scientific Research Service is connected with 
Columbia University.) 


News Item from Cornell 

With the continued support of a research contract with the Office of Naval Research, the 
Mathematics Department of Cornell University is further expanding research and instruc¬ 
tion in the theory of probability and its applications. At present Professors Feller, Kac, 
Chung and Dr. Donsker are participating in the work. Professor G. Elfving of the Uni¬ 
versity of Helsingfors has been appointed Visiting Professor of Mathematical Statistics 
for the academic years 1949-1951. Professor J. L. Doob, on sabbatical leave from the Uni¬ 
versity of Illinois, will spend the year 1949-60 at Cornell. Dr. Gilbert Hunt has been ap¬ 
pointed Assistant Professor of Mathematics. 



REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 


The thirty-eighth meeting of the Institute of Mathematical Statistics was 
held at Columbia University, New York City on Friday afternoon and Saturday, 
April 8-9,1949. The meeting'was attended by 93 persons including the follow¬ 
ing 80 members of the Institute: 

A. Abruzzi, T. W. Anderson, Leo A. Aroian, Robert Bechhofer, A. A. Bennett, Joseph 
Berkson, Allan Birnbaum, C. I. Bliss, Paul Boschan, P. G. Carlson, Uttam Chand, Yunien 
Chen, E. P. Coleman, T. F. Cope, Jerome Cornfield, L. M. Court, M. I. Cropsen, J. H. Cur¬ 
tiss, Cuthbert Daniel, F. R. Del Priore, W. E. Deming, J. A. Dudman, David Durand, 
C. W. Dunnett, A. Dvoretzky, P. S. Dwyer, Churchill Eisenhart, H. L. Edgett, Harry 
Eisenpress, Lillian R. Elveback, D. A. S. Fraser, Murray Geisler, L. A. Goodman, J. I. 
Grifiin, C. C. Grove, E. J. Gumbel, Miriam S. Harold, Mina Haskind, L. H. Herbach, Harold 
Hotelling, Cuthbert Hurd, Arthur Kaufman, Roger D. Keeney, Paul Koditschek, Carl F. 
Kossack, Howard Levone, Jack Laderman, I. D. Lorge, C. L. Marks, Paul Meier, Frederick 
Mosteller, E. B. Mundie, C. M. Mottley, I. U. Mulk, Paul Neurath, G. E. Noether, Doris 
Newman, M. L. Norden, E. W. Pike, J. K. Perrin, H. M. Rosenblatt, Frank Saidel, William 
Salkind, F. E. Satterthwaite, Richard Savage, Henry Scheff^, H. L. Seal, Jack Sherman, 
Rosedith Sitgreaves, J. H. Smith, J. J. Sodano, Herbert Solomon, Mary N.Torrey, J. W. 
Tukey, S. S. Wilks, D. F. Votaw, Helen M. Walker, Lionel Weiss, Jack Wolfowitz and 
W. W. Wryht. 

The Friday afternoon session consisted of a Symposium on Applications of 
Multivariate Analysis^ Professor S. S. Wilks of Princeton University presiding. 
The following two invited papers were given: 

1 . Tests of Differences in Composite Growth Measurements in Pig Feeding Trials, J. Wishart, 
Cambridge University and University of North Carolina. 

2. Fields of Application of Multivariate Analysis, Harold Hotelling, University of North 
Carolina. 

The prepared discussion was presented by Professor S. N. Roy, Presidency 
College, Calcutta, and Columbia University, followed by discussion from the 
floor. 

The Saturday morning sesssion was opened by a business meeting, Dr. 
Churchill Eisenhart, National Bureau of Standards, presiding. Among other 
items of business the Constitution of the Institute was amended to provide for 
Institutional Membership, and the by-laws amended to specify the status and 
privileges of Institutional Members. The revised Constitution and By-Laws 
appear elsewhere in this issue. 

The second part of the session. Dr. W. Edwards Deming presiding, was 
devoted to an invited address: Non-Linear Regression Laws and '^Internal Least 
Squares, by Dr. H. 0. Hartley, University College, London and Princeton 
University. 

At the Saturday afternoon session. Professor Henry Scheff4, Columbia Uni- 

325 



328 


CONSTITUTION AND BY-LAWS 


ARTICLE 4 
Council 

The Council shall consist of not less than twelve elected members in addition 
to the Officers of the Institute except that vacancies in the Council occurring 
subsequent to an election shall not be filled until the next annual election. 

Elected members shall be elected for terms of three years, the terms of approxi¬ 
mately one-third of them terminating each year. 

The Council, representing the Members, shall determine the policies and 
supervise the affairs of the Institute in accordance with any Bylaws the Institute 
may adopt. It shall determine the standing committees of the Institute and 
the number of elected members of the Council. 

The Council shall elect the Secretary, the Treasurer, and the Editor, by 
majority vote. The Council shall determine the number, if any, of Associate 
Secretaries, Associate Treasurers and Associate Editors. The Secretary shall 
nominate Associate Secretaries, the Treasurer shall nominate Associate 
Treasurers, and the Editor shall nominate Associate Editors which the Council 
may elect by majority vote. Such Associate Secretaries, Treasurers, and 
Editors shall be non-voting members of the Council. 

The Council shall meet at least twice a year, usually at times of meetings of 
the Institute, and otherwise at the call of the President or the call of any five 
members of the Council. Any voting member unable to be present may appoint, 
in writing, a representative to speak for him, and such representative shall be 
entitled to vote. A quorum shall be seven persons entitled to vote. Majorities 
and other fractions of the Council are to be based on the number of persons 
present and entitled to vote. 

ARTICLE 5 
Executive Committee 

The Officers shall constitute the Executive Committee of the Council, and 
shall conduct the affairs of the Institute. 

The Executive Committee may create temporary committees with assigned 
tasks coming within the scope of the Institute. 

ARTICLE 6 

Nominations 

The President shall appoint a Nominating Committee and shall announce 
their names at the annual meeting when he retires from office. This Committee 
shall submit to the Members, through the Secretary and at least sixty days 
before the closing of polls at the next succeeding annual meeting, one nomination 
for President-Elect and a slate containing at least twice as many names as there 
are vacancies on the Council. 

Additional nominations may be made for President-Elect or for the Council 
by a petition signed by twenty Members. Such nominations shall appear on 



CONSTITUTION AND BY-LAWS 


329 


the ballot if they are in the hands of the Secretary at least 30 days before the 
closing of polls at the next succeeding annual meeting. In any event, Members 
may vote for names in addition to those nominated. 

ARTICLE 7 
Fellows 

The Council, may, by majority vote, elect to fellowship any Member nomi¬ 
nated by the Committee on Fellows. Such nomination and election shall be on 
the basis of the nominee’s contributions to the development, dissemination, and 
application of mathematical statistics. 

ARTICLE 8 
Committee on Fellows 

The Council shall elect two Fellows annually to serve for three years on the 
Committee on Fellows. One of the Members whose term is next to expire shall 
be designated by the President as chairman. 

ARTICLE 9 

Publications 

The Annals of Mathematical Statistics shall be the official journal of the Insti¬ 
tute. Other publications may be authorized by the Council. 

The publications of the Institute shall be supervised by the Editor, with the 
assistance of the Associate Editors and such committees as the Council may 
approve. 

ARTICLE 10 
Communications 

Public announcements concerning the Institute, including statements of policy, 
recommendations, reports of committees and accounts of Council meetings shall 
be issued by the Secretary or the President with the prior approval of the Council 
or its Executive Committee. Advance publicity concerning meetings may be 
released by authorized Program Committees or Publicity Committees. 

ARTICLE 11 

Affiliation 

By a three-fourths vote, the Council may authorize the affiliation of the 
Institute with any organization whose aims are consistent with those of the 
Institute. 


ARTICLE 12 
Amendments 

This constitution may be amended by an affirmative two-thirds vote of those 
Members voting at any regularly convened meeting of the Institute provided 



330 


CONSTITUTION AND BY-LAWS 


notice of such proposed amendment shall have been sent to each Member by 
the Secretary at least thirty days before the date of the meeting at which the 
proposal is to be acted upon. Members may vote in person or by mail. The 
Secretary shall send to the Members any amendments recommended by the 
Executive Committee or proposed through a petition of 25 members of the 
Institute. 

ARTICLE 13 

Emergencies 

In an emergency, as determined by the President or the Executive Committee, 
or by a majority of the Council, a meeting of the Council to transact business 
or a meeting of the Institute to amend the constitution may be conducted by 
mail. 

BY-LAWS OF THE INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE 1 
Duties op Officers 

The President, or in his absence the President-Elect, or in his absence a Mem¬ 
ber appointed by the Executive Committee, shall preside at business meetings 
of the Institute. 

The Treasurer shall send out calls for annual dues, pay all bills for expenditures 
authorized by the Institute, Council, or Executive Committee; keep a detailed 
account of all receipts and expenditures; prepare a financial statement at the 
end of each fiscal year and present an abstract of same at a business meeting of 
•the Institute after it has been audited by a Member or Members appointed by 
the President, to whom such Member or Members shall report. 

The Secretaiy shall, subject to the direction of the Council, have chai^ of 
the archives and other tangible and intangible property of the Institute and shall, 
upon the direction of the Council, publish a classified list of all Members of the 
Institute, and of Institutional Members at their request. 

The Editor, subject to the direction of the Council, shall have charge of all 
editorial mattere, whether relating to the official Journal or to other publications. 
He shall, with the advice and consent of the Council, appoint an Editorial Com¬ 
mittee of not less than twelve Members to cooperate with him for definite terms. 
All appointments to the Editorial Committee shall terminate with the appoint¬ 
ment of a new Editor. 


ARTICLE 2 
Dubs 

Members shall pay seven dollars at the time of admission to membership and 
shall receive the full current volume of the official Journal. Thereafter Members 
shall pay seven dollars annual dues, of which five dollars shall be for a subscrip¬ 
tion to the Official Journal. There shall be the following exceptions: 



CONSTITUTION AI^D BY-LAWS 


331 


A. Two Members of the Institute who are husband and wife may elect to 
receive one copy of the Official Journal between them, when their dues 
shall each be reduced by twenty-five percent. 

B. Any Member may make a payment in place of all succeeding annual dues 
based on a suitable table and rate of interest specified by the Council. 

C. Any Member on active military duty may notify the Treasurer that he 
wishes neither to pay dues nor to receive the Official Journal during the 
current year. He may receive the official Journal for the suspended 
years on payment of one-half of the suspended dues within one year after 
resuming payment of annual dues. 

D. Any Member who resides outside the Western Hemisphere shall pay five 
dollars annual dues. 

Institutional Members shall pay annual dues of at least $100. For each $100 
of annual dues, an Institutional Member shall receive two copies of the Official 
Journal, one bound, and shall be entitled to designate one person to have the 
full prerogatives of a member without further payment of dues (including the 
receipt of a personal copy of the Official Journal). Twenty-five dollars of each 
$100 shall be allocated to the three subscriptions to the Official Journal and the 
binding of one copy. 

Annual dues shall be payable on the first day of January of each year. 

It shall be the duty of the Treasurer to notify by mail anyone whose dues are 
six months in arrears, enclosing a copy of this article. If such person fails to 
pay such dues within three months from the date of mailing such notice, the 
Treasurer shall report the delinquent to the Council, who may suspend the 
delinquent from membership and who may reinstate the delinquent upon pay¬ 
ment of arrears. 


ARTICLE 3 
Salaries 

The Institute shall not pay a salary to any Officer, Councilor, or member of 
any committee. 


ARTICLE 4 
Amendments 

These Bylaws may be amended in the same manner as the Constitution or, 
if the proposed amendment has been previously approved by the Coimcil, by a 
majority vote at any regularly convened meeting. 



JOURNAL OF THE ABIERICAN 
STATISTICAL ASSOQATION 

DECEMBER, 1948 
Articles 

Commercial Uses of Sampling.J. Stevens Stock and Joseph R. Hochstim 

Variation of the Frequency of Fatal Quarrels with Magnitude. Lewis F. Richardson 

Bank Reserves and Business Fluctuations. Clare Warbubton 

The Ordering of n Items Assigned to k Rank Categories by Votes of m Individuals 

Garret L. Schuyler 

Levels of Significance for Variance Ratio of Two Samples of Equal Size 

C. J. Kirchen 

Main Effects and Interactions.D. J. Finney 

A Test for Symmetry in Contingency Tables. Albert H. Bowkbr 

The War Production Board’s Statistical Reporting Experience, Part IV 

David Novice and George A. Steiner 
Correction to *'On Estimating Precision of Measuring Instruments and Product 

Variability”. Frank fi. Grubbs 

Statistical Methodology Index. Oscar Krisen Bubos 

AMERICAN STATISTICAL ASSOCIATION 
1603 K Street, N* W*, Washington 6, D. C. 


MATHEMATICAL REVIEWS 

A journal containing reviews of the mathematical liter¬ 
ature of the worlds with full subject and author indices 

Publication of this journal is sponsored by the American Mathe¬ 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matcmatica Argentina, and others. 

Subscriptions accepted to cover the calendar year only. 

Issues app)ear monthly except July. $20.00 per year. 

Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
531 West 116th Street* New York City 27 










ON THE THEORY OF SYSTEMATIC SAMPLING, H 
By William G. Madow‘ 

Institute of Statistics, University of North Carolina 

1. Summaiy and introduction.' In an earlier paper,* [1] an approach to the 
problem of systematic sampling was formulated, and the associated variance 
obtained. Several forms of the population were assumed. The efficiency of the 
systematic design as compared with the random and stratified random design 
was evaluated for these forms. It was remarked that as the size of sample in¬ 
creased the variance of a systematic design might also increase, contrary to the 
behavior of variances in the random sampling design. This possibility was verified 
in [2]. 

One approach to the study of systematic designs, given by Cochran [3] removed 
this difficulty to some extent by changing the problem to one of the expected 
variance, and supposing the elements of the population to be random variables. 
He showed that if the correlogram of these random variables is concave upwards, 
then the expected variance of the systematic design would be less, and often 
considerably less, than the variance of a stratified design. 

In the present paper the results of the earlier papers are extended to the sys¬ 
tematic sampling of clusters of equal and unequal sizes. Some comments on 
systematic sampling in two dimensions are included. 

In section 2 we derive two theorems that have considerable applications in 
many parts of sampling. Although it has been common for people working in 
sampling theory to tell each other that these theorems ought to be true, yet no 
reference seems to e.xist. 

In section 3 we develop the implications of a remark [1, p. 13] that in designing 
sample surveys we should try to induce negative correlation between strata. In 
Theorem 3 we obtain sufficient conditions for the correlation to be negative. 
The lemma and Theorem 4 given in Section 4 enable us to extend the uses of 
Theorem 3 in practice. As an application of these results, we show that if a 
population has a concave upwards correlogram, and if strata are defined in an 
optimum fashion for the selection of one element at random from each stratum, 
then we can define a systematic type design that Avill be more efficient than 
independent random selection from each stratum. 

In sections 5 and G we obtain various results in the sy.stematic sampling of 
clusters largely as applications of the more general theorems of the earlier sec¬ 
tions. In general the results are of a nature similar to those of [1] and [3] in that 
the formulae show the conditions under which systematic sampling may be 
expected to be more efficient than random or stratified random sampling. We 
have not, however, applied these formulae to specified types of populations. 

* Submitted for publication, November, 1948. Parts of this paper were prepared while 
the author was Visiting Professor of Statistics at the University of S&o Paulo, Brazil. 

* Beferences to the articles and book cited are given by Roman numerals. 

333 



334 


-WILUAM O. MA.DOW 


From [1, 2 and 3] it is already apparent that this work will be useful and such 
studies should be more valuable when made in connection with important t 3 rpes 
of surveys or data than when made as illustrations in a general paper. 

2. Random events and conditional expectations. Almost invariably, samples 
are selected in several stages. For example, to select a sample of households from 
a city one frequently used method is the following two stage sampling plan: 

a. A map of the city showing the location of each block is obtained and 
brought up-to-date. 

b. Using this map, a sample of the blocks of the city is selected (this is stagel). 

c. From the households on the blocks selected in stage 1, a subsample of house¬ 
holds is selected (this is stage 2.). 

In this section, we give a general approach for evaluating the means and 
variances associated with multi-stage sampling. This approach has the ad¬ 
vantage of at once yielding the contributions to the variance arising from 
each stage. Furthermore, the theorems presented are useful in calculating vari¬ 
ances even when our interest is not in multi-stage sampling. The theorems are 
presented in general terms because of their wide application in sampling. 

We shall say that the result of performing an operation is a random event A* 
if the result can assume m possible states Ai, • • • , Am with probabilities pi, • • • , 
Pm, where 

P{A* =* A,} = Pi, S p. = 1, 

and P\A* = Ai) is read “the probability that the random event A* assumes 
the state A<.” 

One illustration of an operation is the operation of selecting a sample of blocks. 
‘If there are N blocks in the city of which we select n in such a way that each 
set of n of the N blocks is a possible sample, then there are Cn possible samples. 
In this case m C" and the Cn possible samples are the m states of A* “the 
result of selecting the sample of blocks.” Furthermore, if each of the possible 
samples of blocks is equally likely to be selected, then 



The random event A* may also be the taking on by a random variable of 
one of its possible values. If z* is a random variable having possible values 
zi, ' • • , Zm with probabilities Pi , * * * , Pm then we can define the states of 
A* to be A< where A< is “ 2 * == 2 <.” 

Thus the notion of a random event includes the two types of randomness 
that are met in selecting samples. 

Let s' be a random variable. Then, by the conditional expectation of x' subject 
to the random event A* is meant the random variable E*(x' | A) whose possible 
values are E^x' \ A,-), t 1, • • • , m and whose probabilities are Pi , that is 

P{E*(x' I A) - Six' 1 A,)} - p, = P{A* - Ai], 



STSTBICATIO SAMPLING 


335 


where 

(2.1) E{x' MO = 2^ XiivMi), 

P -1 

Xi] is the jth of the Nt possible values of x' when At occurs, and 

pAAi) » Fix' = 1 A.} 

is “the probability that *= xa given that Ai occurs.” It should be noted 
that if 


p.y = Fix' = Xij}, 

then 

Pij = F[x' - X.y, A* = il,) 

since the fact that x' = x.y implies the occurrence of At. Then 

(2.2) PvPi{.A,) = Pi ,. 

We state Theorems 1 and 2 without proof since their proofs are immediate. 
Theorem 1. The expected value of the random variable E*{x' 1 ^4) is Ex', i.e. 

ElE*(x' I ^)j = Ex'. 

By iT*y'\A we shall mean the random variable whose possible values are 
(rx't’\A, , i — 1, • • • , OT where 

= E{W - E(x' 1 .4.)] W - E(y' i A,)] | .4.} 

and 


P[<rx’y’\A — — Pi — F\A* — *4,}, 

i.e. 

aU^A = E*{[x' - E*(x' 1.4)] [y' - E*{y’ 1.4)]! .4}. 

Furthermore, the symbol <rx>(s'U)x«(«'M) will stand for “the covariance of the 
two random variables E*{x' | ^4) and E*(p' \ A)." The corresponding definitions 
of variance are obtained by replacing y' by x' above. 

Theorem 2. If x' and y' are random variables, then 

Oylyt =» EOylyt M + <r*»(**|4) 

and 

m Boi’lA + • 

We note that, since the pij, Pi and py(A{) are not specified. Theorems 1 and 
2 are valid for any two*stage plan, llie generalisations of Theorems 1 and 2 
to multi-etage plans are obvious, but in practice it often turns out to be simpler 
to apply the theorems several times. 



336 


WILLIAM a. MADOW 


It would be easy to give applications of Theorems 1 and 2 but these are not 
essential for our purposes in this paper. As remarked in the introduction, these 
two theorems have long been part of what we may call the folklore of sampling. 

3. Stratified sampling and negative correlation, with an application to syste* 
matic sampling. In discussing plans for sampling from a stratified population 
it is customary to suppose that if x' is an estimate and xi + • • • + 
where Xj is the contribution to x' arising from the jth of the L strata, then the 
sampling is to be so done that the random variables Xi and x,-, j 9 ^ t, are inde¬ 
pendent. 

In [1, p. 13] it was noted that if a population were stratified, and if the elements 
were so selected that the contributions from different strata were negatively 
correlated, it would follow that the variance of the estimate would be less than 
if the contributions were independent but had the same covariances within 
strata. This was, of course, an immediate conclusion from the fact that 

L 

•,>-1 


and, hence, if 

(3.1) C = Z < 0 

then a\> is less than it would be if C = 0. If C < 0 we shall say that the sample 
design has “negative correlation.” 

It is obvious that any population may be taken to be itself a sample, a sample 
from the possible populations that might have been produced by the forces that 
ietermined the existing population. Inasmuch as sampling designs are often 
chosen on the basis of a knowledge of the dominating forces and some past 
experience, it is realistic to consider not only the expected values and variances 
for a specific population but also their expected values over all possible popula¬ 
tions determined by the same forces. Cochran [3] has given one illustration of 
the usefulness of considering the expected variance of a sample design. He 
considered the elements xi, ,XnOi the population themselves to be random 
variables and supposed that Exi — n and E(xi — tif — a. For his purposes 
it was also convenient to suppose that if u > 0 then £(*,• — m) (a;»+u — m) = 
Puv*. It was then possible for him to make realistic hypotheses concerning the 
correlogram, i.e. the pu considered as a function of u, that would not have been 
reasonable in dealing with a specific population. He thus obtained general 
conclusions concerning the expected efficiency of systematic sampling designs 
as compared with random and stratified random designs. 

In this paper we shall consider not only the expected values and variaaees 
for the given finite population but also the expected values of these expected 
values and variances under the assumption that the elements of the population 
are themselves random variables. We shall use € to denote the expect^ value 



STSTSMATIC SAHPUNO 


337 


considering the elements of the population to be random variables and as before 
use E for expected values based on the specified finite population. 

Then 

L 

Xj j 

<.#-1 

and if SC < 0 we shall say that the design has ‘expected negative correlation.’ 

We now propose to obtain the beginnings of an approach to sample design 
when it is possible to introduce or take advantage of negative correlation or 
expected negative correlation through the sample design. 

To simplify, we shall begin by considering two strata and shall suppose that 
the possible values of a:' are xi, • • • , x„ while the possible values of y' are yi, • • • , 
y„ . Furthermore, we shall suppose the sampling to be so done that 

P{»' = X,! = P[y' = y<) = P{x' = Xi,y' => y<} = p< > 0, 

n 

SO that ?>• = 1 and P{x' = Xi , y' = =» 0 if i 

Under the above assumptions, it follows that 

n n 

(3.2) <Tx’y' ~ PiXiyi ~ PiVi^iyi • 

t. j~l 

The symbol > 0 means that ^.7 > 0 for all i and 3 and > 0 for at least 
one pair i, j. We shall say that if (x,- — xy) (yy — yy) > 0 then the sets (x) and 
(y), where (x) stands for xi, • • • , Xn and (y) for yi, • • • , y» are similarly ordered 
and if (x< — xj) (y,- — yj) < 0 then these sets are oppositely ordered. Then it 
is easy to prove, (4, p. 43] directly that if the values are oppositely ordered, then 
ffx'tt' < 0 and if they are similarly ordered then (Tx'»' > 0 . 

A somewhat more general result is the following: 

Theorem 3. Let n < k, let 

n k 

h = 2 ^ 23 di) Wi 2 y 

«-i y-i 

6 c a real bilinear form, and let 

n 

t - 23 «»w. 

•“I 

ft k 

be a real linear form, where Wi > 0, «,• > 0 and 23 = 23 = 1* 

i—l 

Then a syffident condition that b > t is 
(3^) ctn'>aii.3^ 

Jf k ^ n and Wi « then b > t if 
(3.4) aij + aji > an + ajj, 


^ deC 



338 


WILUAM O. UADOW 


Peoop. Since 


6 - < = 2 OiiiwiZi -«>.) + £ aijWiZj, 

and since 

h 

1 - 

it follows that 

Hence, b > f if (3.3) holds. Also, if A; = n and Wi = Zi then b > < if (3.4) holds. 
Some obvious generalizations of Theorem 3 have been omitted since we do not 
need them. 

To obtain the result that <r,<y> < 0 if the sets (x) and (y) are oppositely ordered, 
we make the identifications a,-,- = and Zi = Wi = pi . Then (3.4) holds and 
substituting we have 

(3.5) an + a ,7 - a.-,- - o,< = (ar< - a:,) (y< - yj) 

so that if the values are oppositely ordered, <r*'y< < 0, and hence the two strata 
have negative correlation. 

To consider expected negative correlation we note that 

n n 

(3.6) &<Tx*y* — 5-) Pi^ii 4" ViPj^ij 

where we suppose that &Xi = = v and 

&{Xi — m) {Vi — v) = (Tij 

SO that in this case (th is a covariance, not a variance. 

If we put aij = Gij and Zi - Wi = pi , then (3.4) holds and we obtain, as 
suflScient for 6 HrxY ^ be negative, that 

(3.7) (Tij + CTji > (Tii + fjjj 
or, if we define p^* by the equation, 

where <r# «= € (z,- — /*)* and crj = S(yi — v)\ we have 

(3.8) p.V + py< > p,< 4- Pij 

as a sufficient condition for < 0. 

Let us consider the systematic sampling of single elements. In systematic 
sampling, we assume a population of kn ordered elements zi, zj, • * • , z*, 
Zi+», • • • , Ztt, • , Zi+(„_i)t, • • • , Zni of which we wish to estimate the arith- 



SYSTEMATIC SAMPLING 


339 


metic mean £. As our estimate we use 

= (ti + • • • + a:»)/n 

where xi is selected at random from xi , • • • , x* and if xi = xy then x,- = xy+{»_i)t, 
t =* 2, • • • , m. Thus, x' may be interpreted as an estimate based on a stratified 
population, the tth stratum consisting of 


Xi+(,_i)* , • • • , Xk+(i-l)k 


and 


while 


Then 


where 


f^{x,' —* Xa-f(,—1}X;} " P{Xt “ Xa-f l)X; , Xy — Xa4.(y_x)^} •— i/fc 
P[Xi = Xa+(.-i)A- , Xy = X/J+O-Dt} = 0, if a /?. 


^a+(»—l)ifc *^a+(y—l)fc 




Hence, any two strata that are oppositely ordered will yield a negative contribu¬ 
tion to the variance. However, since it is not possible for all strata to be nega¬ 
tively ordered, we do not thus obtain a useful result and must return to the 
consideration of C or <7^ itself as was done in [1]. If, however, we make Cochranes 
assumptions, and consider So-xv', it follows that for the fth and ^th strata 


Pafi — P(y-y)fc+j5_a , 

and (3.8) becomes 

(3.9) P(y_)»: 4 .(fl_a) + p(y_i)t+(a-« > 2p(y_,)i , 

i.e. the correlation function pu must be concave upwards, which Cochran showed 
by other means. By considering it is possible to show that a sort of average 
concavity is all that is required of the correlogram fo^'lsystematic sampling to 
have a smaller variance than stratified random sampling. 


4. Conditions for negative correlation when the strata are of unequal sizes 
with an application to systematic sampling. Often, as in the S 3 rstematic selection 
of clusters with probability proportionate to size (discussed in Section 5) the 
simplified situation dealt with in Theorem 3 does not directly apply. However, 
Theorem 3 may be used to advantage by the following device. 

Let us suppose the possible values of x' to be xi, • • • , Xn and those of ya to be 
yl, • • • , , fc > n and let 

P{y' = pS 1 x' = X.} = Pm. 



340 


WILLIAM O. MADOW 


SO that if we define 

(4.1) = 
then 

y, «■ E{]ii I af = Xa). 

If we define p' to be a random variable having possible values yi, • • • , y» 
with probabilities pi, • • • , Pn where 

Pa - P{x' = X.} 

it follows that 

y' = E*(y;> 1 X') 
and 

Clearly, Theorem 3 is valid for the random variables x' and y'. 

Consequently, we need only determine what restrictions the conditional 
probabilities, p/jio , and the values, j/«, need satisfy for the sets xi, • • • , x« and 
Pi» • •' » to be oppositely ordered or for (3.7) to hold. 

Substituting for pi and yj in (3.5) we see that if 

k 

(4.2) (x„ - Xy) T, yliPfla - PB\y) < 0 

^-1 

then a ^ 

Let 

f! ay ~ ^{Xa m) (.P y !*)• 

Then substituting in (3.7) we see that if 

(4.3) (P/J|« - P5h)(o’% - < 0 

S-l 

or if 

(4.4) (Pjj|« — Pfiiy)(pLfi — p\f) < 0 
then 

In order to use (4.2) and (4.3) the following well-known lemma is often useful. 
Lemma. // fi < {j < • • • < {» < 0 and Ihe quantities a, • • • ,«»are such that 

Z«(»>0 



SYSTEMATIC SAMFIJNO 


341 


then 


53 e/iffl < 0, « = 1, • • •, fc. 

Let us use this lemma to obtain another theorem that will be helpful in showing 
negative or expected negative correlation between strata. 

Theorem 4. Let b be a bilinear form 

n m 

6 = 53 53 aaWiej 

$ 

such that ]C Wi > 0, 2 ^ 0, « " 1, ♦ • •, n ~ 1, s' =» 1, • • • , m — 1, and 

i-i j-i 

(4.5) JLzi = 0. 

Let 


5ij — 0»y At.y^.! "t" Cli4.i,y4.i . 


Then a sufficient condition that b < 0 is 8,y < 0. 

Proof. Upon substituting for and z„ in b from (4.5) we see that 

n—l m —1 

6 = X) 2 

t-1 j-1 


where 


or, if we define, 


5.7 


Orim ^nj “f" ttnm 


n -1 

{y = 53 i'iiWi 

t~l 


then 


m-l 


6 = 53 (y^y- 

y-i 


According to the lemma, it then follows that a sufficient condition that & < 0 is 
that 


<&<••• < f«-i < 0. 
Also, a sufficient condition that 

fy ~ €y+i ^ 0, 


is 


5<y ~ S<.y+i ^ *<+i.y “ Si+i.y+i 



342 


WILUAM O. UADOW 


Then to complete the proof it is only necessary to verify that 

“ 5<.y+i ■” 5<+i.y + *<+i.y+i • 

In the preceding pages we have given an identification of systematic with 
stratified sampling where, instead of the selection being made independently 
within strata, the choice of an element from one stratum determines the choice 
from the other strata. In this identification, however, it was assumed that the 
strata contained the same number of elements. Let us now extend this method 
of selecting samples to the case where the strata have different numbers of 
elements. In so doing we shall illustrate the use of the above lemma and theorem 
4. 

Suppose now that the population consists of N elements xi, • • • ,xk classified 
into n strata, the tth of which contains the Ni elements 


a:j(r,+...+iir<_,+i i * * * > . 


We shall denote these elements by xa , • • • , Xim . 

We shall select one element from each of these n strata. The element selected 
from the ith stratum is written X{. As the estimate of £, the arithmetic mean of 
the population, we use 


£' 



t 

Xi 


and it is well known that if the selection is made independently at random from 
each stratum, then 



where a] is the variance of x^', i.e. the variance of the ith stratmu. 

Let us now consider an alternative to the usual method. We can suppose that 
Wi > 1 without any loss of generality. (The methods are the same for any stratum 
having Ni = 1 and will also yield the same result for any population such that 
either all the Ni = 1 or all but one of the Ni = 1. Differences occur if at least two 
of the Ni differ from 1.) 

We first choose an element at random from the first stratum. Suppose that 
xi' = Xa . Then to choose an element from the second stratum, assuming that 
Ni > 1, we proceed as follows: Multiply Ni by any positive integer U such that 
N^/Ni is an integer, say, h . Assign to each element of the second stratum the 
measure of size U , and form the two sets of cumulative totals ti,2ti, • * * , N-^ 
and kt,2ki, ••• , Nyh. Then with the measures of size 4 assigned to each element 
of stratum 2, and the measure of size assigned to each element of stratum 1, 
it follows that strata 1 and 2 have the same total' size. 

As an example of the arithmetic given below consider the following simple case. 
Suppose that Ni ^ 3 and Ni = 4. Then if we take for k the value 6, it follows 
that = 8. We choose one of the integers 1, 2,3 with equal probability. If the 



ST«rEMA.TIC SAMPUNG 


343 


integer 1 is obtained, we have selected the first element of the first stratum and 
choose an int^r between 1 and 8 with equal probability. If the selected integer 
is between 1 and 6, the first element of the second stratum is selected. If it is 
7 or 8 the second element of the second stratum is selected. Similarly if the 
second element of the first stratum is selected, then we select an integer between 
9 and 16 with equal probability. If that integer has value 9, • • • , 12 the second 
element of the second stratum is selected; if it has value 13, ■ ■ • , 16 the third 
element is selected. 

The general formulation of the selection procedure for the second stratum is: 

Suppose that is the smallest integer such that (a — 1)^2 + 1 < j3o<2 and that 
j8i is such that (/3i — 1)<2 < ak 2 < fiik . Choose^ an integer at random from 
1, • • • ,kt and call that integer /3. Then, if 

(a — l)k2 < (a — l)fc2 + iS ^ fftk 

the /loth element is selected from stratum 2; if 

l8«<, < (a - Dfe + ^ < (do + l)i» 

the (do + l)th element is selected; • • • ; and if 

(di “ l)fe < (a — 1)^2 + d ^ 

the dith element is selected from stratum 2. 

It is easy to verify that when the sample is so selected, each element of stratum 
2 has equal probability of being selected. Hence, if we apply this procedure to 
each stratum we have 



Let us evaluate for this type of selection. Now 

= E (xi — — ij) 

where is the arithmetic mean of the elements of the tth stratum. From Theorem 
2, we then have 

v,;*; = E E[x'i -Eix'i\xia)][x'i - E(x'i\xu] 
iVl a-1 

+ 4- Z [Eix’i 1 Xu,) - $mx'i 1 xi.) - 

It is ea^ to see that the method of selection used above implies that the first 
term of vanishes. Furthermore, is the arithmetic mean of the conditional 
expectations so that we have reduced the problem to one of determining whether 
the conditional expectations satisfy the conditions for negative correlation or 
expected negative correlation. 

If we denote E (xj | xi«) by y,«, then we need to see whether the sets y,i • • • , 



344 


WILLIAM O. MADOW 


yiNi and , • • • , yjsi are oppositely ordered. Now 


(j/»a yi^iVia ^ig^jh^igaP^jhafi 


(7-1 ^-1 


where 


Ct(7a/9 — P\Xi — Xiff I ^lot} ■** P{^i — ^ig 1 ^lp\» 

If a < jS then, according to the method of selection, 

9 

»“I 

while 

Ki 

” 0* 

j-1 

In Theorem 4, we then make the identifications n ’= Nt ,m = N,, 


Wg — iigafi 9 and dgh ”■ Xig Xjh • » 

Then 

3«a = {xig - a:j.,+i)(x,A - xs,i,+i) 

and hence to have negative correlation between the strata, it is sufficient that 
the sets Xn, ••• , xm, and xy<, • • • , xjnj have the type of negative ordering 
represented by Sgh < 0. Similarly, if 

dgh ~ ^(,Xig ~ li^(Xfh “ My)> Mi “ &Xig , 
then, for expected negative correlation, it is sufficient that 

“ dg^+l — ffa+i.» + <r»+i,A+i < 0 . 

Of course, these conditions will be satisfied if a concave upwards correlogram 
exists. Hence, if a population consists of N random variables xi, • • • ,xn having 
a concave upwards correlogram, then, no matter into what strata these elements 
are classified, provided that the order of occurrence of the elements remains un¬ 
altered, the systematic selection of the elements in the sample can be so planned 
as to yield an estimate having smaller variance than the stratified random selec¬ 
tion of the elements in the sample even if optimum allocation is used. If more 
than one element is being selected from a stratum under optimum allocation, 
then the systematic selection of the same number of elements will suffice. If not 
only optimiun allocation but also optimum definitions of strata are being used 
so that but one element is selected from each stratum, then systematic selection 
according to the scheme described in the section will produce a variance not 
larger than the variance of stratified random sampling. It should be noted, how¬ 
ever, that this does not imply that a ‘hammer and tonp’ use of systematic 
sampling ignoring the strata will produce a smaller variance. There is work to 
be done on what is required for the latter to occur. 



SYSTEMATIC SAMPLING 


345 


It may be noted that the procedure of this example provides an answer to the 
systematic selection of elements from a population whose size is not a multiple 
of the size of sample. 


6. The systematic sampling of clusters with probability proportionate to a 
measure of size. It is known f5] that sampling clusters with probability pro¬ 
portionate to a measure of size often yields considerable reductions in the variance 
of the estimates. However, the theory of the systematic selection of several 
clusters with probability proportionate to a measure of size has not been worked 
out, and it is the purpose of this section to make some contributions to that 
theory. 

The most frequently used method of sampling clusters with probability pro¬ 
portionate to size is equivalent to the following: Suppose that the clusters are 
denoted by Ci, • • • , CV and that to the Ath of these M clusters is assigned a 
measure of size Ph . Form the successive totals P\ ,P\-\- Pt P%-\- Pz, ••• , 
-1- • • • -f Pv . If we wish to select m of these clusters, we calculate Pm = 
(Pi + • • • + Pu)/m. Then, assuming that P,- < P,,, j = 1, • • • , Jlf, we select 
an integer with equal probability from 1, • • • , Pm . Calling that integer P', we 
calculate the m numbers P', P' -t- Pm, P' + 2Pm, ■ • • , P' -|- (m — l)Pm. 
If 

(5.1) Pi -h • • • + Pa- 1 + 1 < P' + (f - l)Pm < Pi + • • • + Pa 

for any integer f, i = 1, • • • , jn, then the cluster Ch is selected for the sample. 
Any cluster for which Pj, > Pm is automatically included in the sample, and if 
there are, say, a such clusters, then we calculate Pm-o for the M — a clusters 
remaining after including these a in the sample, and proceed as above. 

In deriving the variance of the estimate we shall use, we interpret that estimate 
as a stratified sampling estimate. Although it is easy to obtain the expected 
value of the estimate without that interpretation, w’e shall need it later in the 
derivation of the variance, and hence we give it here to shorten the total presenta¬ 
tion a little. 

Suppose that clusters Ci. - • •. Cjt, are such that 

Pi + • • • + Pki-i < Pm < Pi + • • • + Pa, . 

Then we define stratum 1 to consist of clusters C,, • • • , Ct, . It is easy to see 
that if the above sampling method is used then 


P!Ca is selected from stratum 1, A < /ci} 


Pa 


P{C*, is selected from stratum 1) 



~ Pa,-i 


Furthermore, suppose that clusters Ca, , • • • , Caj+a, are such that 
Pi + • • * + Pa,+A|-1 < 2 Pm < Pi + * • • + Pa,+Aj . 



346 


WILUAU O. BIADOW 


Then we define stratum 2 to consist of clusters Cki, , Chi+kt • It is easy 
see that if the above sampling method is used, then 

P{Cti is selected from stratum 2} = 

Pm 

p 

P[Cki+k is selected from stratum 2,l<h<ki}= 

Pm 

P{C'»i+»| is selected from stratum 2} = -——s- 

Pm 

Since Pk < P«. we remark that it is impossible that Cki be selected from both 
stratum 1 and stratum 2. 

In general, if clusters Cki+.-.+ii-j, • • • , Cii+.-.+t* are such that 

(5.2) Pi + ' • • + Pt,+...+ii_l < i Pm < Pi + • • • + Pki+---+kt 

then the ith stratum consists of these kt + 1 clusters, and we define^the probabil¬ 
ities Pia, a = 0, ki, hy the equations 

Pjo = P{Cti+...+i<_, is selected from stratum t} 

— Pl!i+.-.+ki_i — (t ~ l)Pm 


(5.3) 


Pia = P {(7* is selected from stratum t, fci + " • + ^i-i < A < A;i -f •••-+- i,} 

= ^, a = h — k, — — fc,-_i, 

P m 

Ptt-i = P{CAi+...+)fe,- is selected from stratum i} 

_ iPm ~ Pi ~ ••• — Pi|;, + ...+t,-l 


We remark that 
(5.4) 


P<-u<_i + P« = 


_ P^i+**'+^‘i—1 


Pm 


Now, let the elements of the population be x*y, fe = 1, • • • , M, j = !,•••, 
Nk , and let the arithmetic mean of the Mh cluster be denoted by Xk . Since the 
N'k are usually unknown but the measure of size, Pk, is known, we sample, not 
with probability proportionate to the N’k , but with probability proportionate 
to the Pk . We shall denote the clusters of the ith stratum by , • • • , C',*.-, 
making the identification 

( 5 . 5 ) Cia «■ Ca+ki,+...+ki.i . 

Furthermore, the number of elements of the clusters are denoted by iVa, • • •, 



SYSTEMATIC SAMFLINC 


347 


Niki, and the means of the clusters by ia, • • • , x»,, where 

Nia ** Na+ki+‘*<+ki^l 

(5.6) Xia * 

so that ia «= Xi-i,ki_i “ 1, • • • , m. 

Furthermore, we define 

(5.7) Xia * Nia Xia!^ia “ 1 

We define the mean of the ith stratum to be 


(5.8) X. = ^ PiaXia/Pm, 

a—0 

and the variance of the tth stratum to be 

(5.9) = i: iiia - Xi)\ 

a—0 rm 

Then, if the mean and variance of the population are defined to be 


(5.10) 


X = 2 PhXh/P 


(5.11) 

it is easy to verify that 

(5.12) 


^ (2a — xf, 

A-i r 


1 

X = — X Xi 
m. .-1 


(5.13) a“ = —— x)^ 

m i-i m i-i 

An unbiased estimate of the total of a characteristic. We shall see that we can 
obtain an estimate of x, where 

M Ni 

X = 2 2 Xij 

i. e. X is the total of the elements of the population. Since N is unknown, the 
estimate of x that is used is the ratio of unbiased estimates of x and N. It is well 
known that this ratio is usually biased. Since we are not making any study of 
ratio estimates here we will not derive the approximation to the variance of this 
estimate. It may be remarked that it can be obtained by a simple extension of 
the results here given. 



348 


WHiLIAM O. UADOW 


Let us agree that the general form of the estimate will be as follows: 

If the j'th cluster of the population is selected we shall subsample ny elements 
from it. The total of the values of the characteristic for these ny elements we de¬ 
note by xy. Furthermore, we denote by n,- the total number of elements sub¬ 
sampled from the tth stratum, or, what is the same, from the cluster selected 
from the ith stratum; and by x7 the total of these elements. Thus, if the jth 
cluster is the tth selected, then n< = ny and x" = xy. We define our estimate x" 
of X, the total of the population, to be 

(5.14) x" = K{xl + • • • + x'L). 

Then, if /C = P/mn and n* = nNh/Ph, it is easy to see that x" is an un¬ 
biased estimate of x. 

The variance of the estimate. We may calculate the variance of x" where 

(5.15) x" = P« {xi + ••• + Xn) and x7 = .x<7n. 


Now, by Theorem 2, 

(5.16) <Tx>' = Ea‘x>>\ji "H <r*»(*"|A) , 

where A* has been defined above. We shall not evaluate since this in¬ 

volves no new problem for subsampling methods using random or systematic 
methods, or methods using probability proportionate to size. 

From (5.15) it follows that 

(5.17) E*(x"\A ) = P„(x( +... + xL) 


or, in other words, E*{x"\ A) is the estimate we would have if the clusters in the 
sample were completely enumerated. We shall denote the second term of (5.16) 
by <r|. Then, 


(5.18) 


2 


= P* 

— * fl 


12 4: + E 

-1 


Now 

(5.19) 





2 

(Ti. 


To calculate cij;}, i s** j, we shall use Theorem 1. 

(5.20) ana = E(xi - fy)(j} - fy) = E{(2; - ti)E*[(i'f - Xy) I*;}. 

To calculate S*[(xy — xy) 1 i'i] we begin by noting that 

(5.21) E*[{$'i - f y) 1 fil a E*.[ii'i - f y) 1 Ci] 

where C? is the random event having fcj + 1 possible states which are the selec¬ 
tions of , • • •, Ciki as the sample clusters of the fth stratum. Now if C,-* is 
one of the clusters of the tth stratum let us calculate 


( 5 . 22 ) 


- ty) I Cia]. 



SYSTB1IIA.TIC SAMPLING 


349 


We begin by determining which of the clusters of the jth stratum are possible 
sample clusters, if we know that C,a is selected from the tth stratum. Since the 
sizes of strata i and j are both it follows that there exist integers 0o and 0i 
such that 

PjD + • • ' + Pift-l ^ Pa + • • • + Pi.a-l < Pjl + • • • + Piit , 
and 

Pfl + • • • + Pjfi-l < Pa + • • • + Pia < Pa + • • • + PjSi • 

Hence, if we know that Cta has been selected from stratum t, it follows that we 
must select one of the clusters 

from stratum j and 

P{Cjt is selected | C,,, is selected} = P'^IPia , |8 = iSo , A + 1, • • •, |3i 

= 0, otherwise, 

where 

PfSt — Pa + • • • + Pjfit ~~ Pa ) P •.«-i 

P% = P,@ , /3 = |8o + 1, • • • , -Bi — 1 

Pi$l — Pa + • • * + P,a — • — P ,3,-1 , 

and 

3i 

L Pi» = P,a 

Then 

(5.23) E[(S:'j — j C',«) = Xj|„ - Xj] 

where 

(5.27) = 

3-30 ri<x 

Hence, substituting in (5.20), we see that 

o-jJ;; = E(xi — ‘xiXxi\i - Xi) 

where ij|< = J/i« if C<o is selected from stratum i. Then it follows that 
(5.25) aSfi'f = X) ^ (®ia — Tii)(xj\a — 'Xf). 

OP-0 Jr m 

Obviously, the conditional expectation can be eliminated from (5.25) by using 
(5.23) but no gain in simplicity or generality thus occurs. 



350 


WILLIAM G. MADOW 


It would be possible to obtain the variances and covariances of the x.' by 
listing all possible samples in any special case. To make this general would only 
require writing the necessary notation. 

Substituting in (5.18) we see that 

<rl = Pm Is H 




where is given by (5.25). 

m 

It follows that if we use the fact that 2 (^< — ®) = 0, then we have 


1-1 


<t\ = P2. £ £ W + - mjla - f), 

t»l m a-»0 fn 

or, returning in part to the "unstratified” notation 

(5.26) ffa = “ 2Z ^ (x* - x)‘ + — 22 £ ^- (xia - x)(xy|« - St). 

TH It Wi ifij i 

By combining terms of the second part of (4.26) generalizations of the formulae 
obtained in [1] are easily obtained. 

Still another means of writing <tb is 

(5.27) (Ts = — |(7* — 22 £^ {Stia — x)(i>|a-J)'l 

TH ^ a*»0 X J 

where 


• = — £ (Xi - Stf, 

m i-i 

which shows both sources of changes in efficiency as compared with sampling 
with probability proportionate to size, and replacing the clusters obtained. (It 
is, of course, obvious that P^&^lm is the variance of .E*(x" | 4), if we assume the 
m clusters to have been selected with probability proportionate to size, each 
selected cluster being replaced before the next is selected.) 

By considering (5.26) and (5.27) it is clear that systematic sampling with 
probability proportionate to size will be more efficient than sampling p.p.s. 
with replacement imder much the same conditions as when we sample single 
elements. The details are omitted. They depend on applying the Lemma and 
Theorem 4. The summary of the conditions is: If we sample systematically with 
p.p.8., and if the two sets xa , • • • , x,*, and j//,-,. • • • y,*, are monotone, one being 
monotone non-icreasing and the other monotone non-decreasing, then the covari¬ 
ance between the fth and jth strata will be negative, and thus gains made as 
compared with independent sampling from the strata. 

If we define 

= €(Xf« — &Stia){Sjfi — SStjff) 



SYSTEBfATIC SAMPLING 


331 


then the concavity condition for systematic sampling p.p.s. to yield a smaller 
variance than independent sampling p.p.s. from each stratum is, if a < 0, 


0 0^0 

s <^a2 “ 


<ry2 ^ 


< (Takf — (Tykf < 0 . 


6. The systematic sampling of clusters of equal sizes. Let us now suppose 
that our population consists of clusters of elements, the clusters being of equal 
size, i.e. containing the same number of elements. To be specific, let the popula¬ 
tion consist of M clusters, where M = cm and each cluster contains N elements, 
where N = kn. Then, the value of the characteristic being measured for the 
ath element of the fth cluster may be denoted by Xia , and the total of all the 
elements of the ith cluster may be denoted by Xi. The arithmetic mean of the 
population is x, and thus 

It 

Mi — ^Xi. 


where 

JV,z<; = Xi . 

a. Complete enumeration of dusters in sample. First, suppose that we wish to 
estimate x by x', where i' is the arithmetic mean of the sample obtained by 
selecting a systematic sample of m of the M clusters, and enumerating all elements 
within each cluster in the sample. Then, we may write 

m 

( 6 . 1 ) = 

1-1 

where x»• is the mean of the ith cluster selected for the sample. From [1], it follows 
then that 

{1 + (m - l)p,} 
tn 

where M<r\ = £ (^<- ~ £f, and j5. is defined as pt in [1, p. 6], but with i,- in 

place of Xi . Now from the theory of the random sampling of clusters it follows 
that 

cl = ^ {1 -h (AT - l)p} 
where is the variance of the population, i. e. 

MNc^ = S 2 (®«7 “ 

i-i i-\ 

and p is the intraclass correlation coefficient of elements within clusters, i. e. 

ff*p » — c\/N — 1, 



352 


WILLIAM G. MADOW 


where 

MNal, ==22 fey — ^.)‘- 

«-i y-i 

Thus 

^ 2 

]6.2) = -f_ {1 + (iV - l)p} {1 + (w - l)j5.}. 

mJy 

Of the three factors in (6.2), aImN is the variance of a random sample of size 
mN selected with replacement; 1 + (iV — l)p is the factor arising from the use 
of clusters; and 1 ^(m — l)p<, is the factor arising from the fact that the clusters 
are sampled systematically. 

b. Stratification and subsampling. When we consider the possibilities of stratifi¬ 
cation and subsampling, the number of possible designs increases tremendously. 
For example, it would be simple to calculate the variances of arithmetic means 
obtained by stratifying the population, selecting sampling units with probability 
proportionate to size, subsampling systematically, again subsampling systemat¬ 
ically and finally subsampling at random. However, such studies may be left 
to be made in connection with the practical problems in which they are to be 
used. Rather than attempt to consider many of the possibilities that might 
arise in practice, we shall here give only the results of the systematic subsampling 
of a systematic sample. The variances of many other designs may easily be ob¬ 
tained by means of Theorems 1 and 2. 

Suppose now that from each of a systematically selected sample of m clusters 
we subsample, systematically, n elements. Then, let our estimate of x be x' 
where, if Zio is the ath selected element from the fth sample cluster, then 



From Theorem 2, it follows at once that 

(6.3) ^ {1 + (i\r - l)p}{l + (m - 1 )^ U + fe ” Dw). 

mis M 1-1 mn 

where a\ is the variance within the ith cluster and p,- is the average serial cor¬ 
relation within the ith cluster as defined in [1, p. 6]. It is simple to calculate the 
variance of x' also when the sub-sampling is done by considering the m clusters 
in the sample as one population from which a systematic sample is selected. This 
is the case that occurs when a sample of blocks is selected and all the households 
on the sample blocks are listed serially, a systematic sample then being selected 
from the lists. However, for our present purposes it is the analysis of (6.2) that 
is important and we now turn to a brief discussion of (6.2). 

The most important conclusion to be drawn from (6.2) is that the systematic 



SYSTEMATIC SAMPLING 


353 


selection of clusters even when ^stematic selection is desirable, may not com¬ 
pensate for the increase in variance caused by the use of clusters. Systematic 
selection will provide the same relative gains but these gains may not be large 
enough to produce the inequality 

{1 + iN - l)p}{l + {m- l)pc} < 

A problem that we have not worked through is the following: By regarding 
the elements of the population as random variables, we obtain conditions on the 
average correlations among elements of a single cluster as well as on the average 
correlations among elements of different clusters that enable us to state where the 
eystematic sampling of clusters of equal sizes may be expected to yield a smaller 
variance than the random or stratified random sampling of clusters or of indi¬ 
vidual elements. This solution should be straight forward. 

c. Systematic sampling in two dimensions. Systematic sampling in two dimen¬ 
sions occurs in such practical problems as the selection of a sample of blocks from 
a city or the selection of a sample of plots from a field. 

In selecting blocks from a city, the procedure most often followed effectively 
reduces the problem to one dimensional form by first numbering the blocks of 
the city or a part of it, in serpentine fashion beginning, say, in the upper right 
comer of a map of the city and numbering the blocks in the top row from right 
to left continuing the numbering of the second row from left to right and so on. 
Then a systematic sample of these block numbers, and hence, of the blocks 
themselves is selected. Clearly, this procedure should not be the most efficient 
if neighboring blocks are highly correlated, since, to cite an imrealistic possi¬ 
bility, the possible samples might turn out to be columns of blocks of the city. 

A second two dimensional systematic sampling procedure might be that of 
selecting a systematic sample of the rows and a systematic sample of the columns, 
thus obtaining a grid sample. This design too is inefficient when there is a “fer¬ 
tility gradient” along rows or along columns. 

The reason for the inefficiency of both of these procedures can be found by 
examini n g the formulae for the variances of systematic samples. If the numbering 
is serpentine, then it becomes illogical to expect that the correlogram is concave 
upwards and sharp deviations from that pattern may occur. In the grid design, 
which is a special case of the systematic sampling of clusters with systematic 
subsampling, we may examine (6.3) and note that the intra-class correlation 
coefficient p may be large enough for oy to be large even when p, is negative. 

Clearly, (6.3) suggests that the possible samples be so defined that p is as 
small as possible. In square fields this might be attained by defining the possible 
samples to be plots of a Knut Vik square having the same treatment, and sim¬ 
ilar definitions of possible samples could easily be given for irregular fields. 
This subject is, however, left for further study.* 

' One of the referees of this paper has drawn the author’s attention to an article [6], 
the data of which, especially Table 3, are in accordance with the opinions expressed above. 



354 


WILLIAM G. MADOW 


REFERENCES 

[1] W. G. Madow and L. H. Madow, “On the theory of systematic sampling, I,“ AnnaU 

of Math. Slat., Vol. 16 (1944), pp. 1-24. 

[2] L. H. Madow, “Systematic sampling and its relation to other sampling designs,“ Am. 

Stat. Asso. Jour.f Vol. 41 (1946), pp. 204-217. 

[3] W. G. Cochran, “Relative accuracy of sj’^stematic and stratified random samples for a 

certain class of populations,** Annals of Math. Stat., Vol. 17 (1946), pp. 164-177. 

[4] G. H. Hardy, J. E. Littlewood and G. Polya, Inequalities, Cambridge University 

Press, London and New York, 1934, p. 43. 

[6] M. H. Hansen and W. N. Hurwitz, “On the theory of sampling from finite popula¬ 
tions,** Annals of Math. Stat., Vol. 14 (1943), pp. 333-362. 

[6] P. G. Homeyer and C. A. Black, “Sampling replicated field experiments on oats for 
yield determinations,** Sod. Sci. Soc. Proc., Vol. 11 (1946), pp. 341-344. 



PROBLEMS IN PLANE SAMPLING 


By M. H. Quenouille 

Rothamsted Experimental Station, Harpenden, England 

1. Summary. After consideration of the relative accuracies of systematic and 
stratified random sampling in one dimension the problem of estimation of linear 
sampling error is discussed. 

Methods of sampling an area are proposed, and expressions for the accuracies 
of these methods are derived. These expressions are compared for large samples, 
with special reference to correlation functions which appear to be theoretically 
and practically justified, and systematic sampling is found to be more accurate 
than stratified random sampling in many cases. Methods of estimating sampling 
errors are again considered, and examples given. The paper concludes with 
some remarks on the problem of trend in the population sampled. 


2. Accuracy of systematic and stratified random samples in one dimension. 
W. G. Cochran [1] has given expressions to the variances of the means of samples 
of size n drawn from a population xixs Xnk when the method of sampling is 
random (r), stratified random {st) and systematic {sy). He assumes the elements 
xixj ‘"Xnk to be drawn from a population in which 

E(Xi) “ li, Ei^Xi “ /i) “ , E{Xi “ /i) (x,*.^ "" fi) — PuO 


where p« > p* > 0 whenever u <v, and derives the expressions 

2 


( 1 ) 

( 2 ) 

(3) 


)fcn-l 1 

(Jen ”” u^Pu I 


n (* it) [‘ kn(kii - 1) Si 

•[> 


(T»t 


2 

<7*y 


kn(k — 1) 


kn—X 

{kfi y^Pu “f" 


2k 


n(k - 1) 


n—1 " 

22 (n - w)p*» . 

1I—1 


Using these expressions which are linear functions of the Pu Cochran compares 
the relative efficiencies of the methods of sampling for several types of correlo- 
gram. It is worth noting that (1), (2) and (3) can be derived under more general 
conditions than Cochran considered. If we assume that (a) each x,- is a sample 
from a population with mean m and variance o\ , (b) that p,- is distributed about 
mean p with variance v*, (c) that E(jii — p) — p) = p<pr*, and (d) that 

j u 

Pit “ - 2 Pi.t+u, then it is not difficult to show that (1), (2) and (3) 

Kn — u 


355 



356 


M. H. QtlENOOTLLE 


require the addition of a superposed variation ^ 

hand side of the equations. Thus it should be remembered that Cochran’s 
results give theoretical maxima to the relative efficiencies of the various methods 
of sampling, while p„ is the mean correlation between samples u apart. This 
result is perhaps interesting in connection with sampling for say, insect infesta¬ 
tion, when at each point there will lie a mean level of infestation and the sample 
will be distributed in a Poisson distribution about this mean. Then the superposed 
variation is 



If we are sampling a continuous process', for n laige we can write down the 
integral equivalents of (1), (2) and (3) 


) 


kn 


7 —^ (A to the right- 
Kn <-i 


2 O' 

Or — 




n 



(4) 

2 

(TMt 

2 

G 

rs^ — 

n 

1 

1_1 

(d - u)p, 
Jo 

(5) 

2 

O' My 

2 

G 

rs^ — 

9 

f pudu + 2 


n 

L d J 

'o 


LpJ 

u-l J 


where pw is the mean correlation between successive elements of the sample, u 
apart and d is the mean distance between samples. We have thus 


0’s 


'9 / f d ^ pdu , 

ct l_ Jo a Jd w-i 


which can often be used to investigate, quickly and roughly, with the aid of a 
graph the difference between the efficiencies of stratified random and systematic 
sampling. Figure 1 shows how this is done for four types of correlogram. 

For a continuous Markoff scheme, we have pw = p" and 


2 

(^Mt 


2 

or gy 




2 

log P‘‘ 

2 

log 


+ 

+ 


(log p")* 

2p‘* 1 

1 - pd’ 


2p^ 


which agree with Cochran’s results. 


8. Replication and the estimation of error. Yates [2] has pointed out the 
difficulties attached to the estimation of error for a S 3 rstematic sample. It will, 
however, be worthwhile to investigate this point using the above formulae. 

‘ In practice we can sample a continuous process only as if it were a discontinuous process 
with k large. 



357 


I^BOBLEMB IN PLANE SAMPLING 

For raadom, stratified ran^m and systematic sampling, if n is large and k is 
regarded as constant, then the variance of the estimate of the mean will be of 
the form aF{k)/n, where Fik) is virtually independent of n. Thus, if we have 
any method whiclj^provides aQ^imate of error for the samples it will be possible 
to split the serie%i|o be sampled into several equal parts (or blocks) to obtain an 
estimate of error'^Riip mean of-each part and to combine these to obtain a more 
accurate estimated the error of the overall mean. In fact, if n is very large, we 
may wish to reduce our number of observations by obtaining estimates of error 
from a random selection of these parts. For stratified random sampling, Fik) is 
completely independent of n, so that we may combine our estimates of error from 
each strata. This leads us to the commonly used method of taking q randomly 
chosen elements per strata, and combining the sets of variances of g — 1 degrees 
of freedom to form an estimate of error. If we make our samples exclusive, 
i.e. no two elements can coincide, then this variance has to be multiplied by 
1 — q/k to give the estimated variance of the sample mean. 

We can in the same way estimate the variance of the mean of a systematic 
sample by using sets of q systematic samples of sufficient length with randomly- 
chosen starting points. This sampling will, however, be more difficult to carry 
out in practice, and we might consider other methods. Our systematic samples 
may be chosen to be invariable in each part or block into which the series is 
split so that our sampling procedure involves, in all, only q systematic samples, 
or we might follow the method advocated by Yates of choosing our q samples 
to be evenly spaced, so that they are subsamples of a larger ^stematic sample. 
Whereas this latter method has simplicity and its possible incorporation into a 
more extensive scheme to recommend it, its use has to be very carefully con¬ 
sidered. If we consider the discrete case, we wish to estimate 


( 6 ) 





but any estimate of variance based on q evenly-spaced systematic samples can 
contain only terms of the form p*u/«, and while an estimate of variance based 
on q randomly-chosen systematic samples will obviously be limited, it will, in 
most cases, be more representative. As an example, suppose we take fc = 16 
and g =: 4 then we can compare the relative occurrences of observing the correla¬ 
tions pi • • • pu in the estimate of variance. Six examples of this are given in 
table 1, the random numbers having been drawn from Fisher and Yates tables; 
p„ and pi«_„ being shown together, since they occur equally frequently. The 
table demonstrates how randomly-chosen samples, even as nearly systematic 
as the first two randomly-chosen samples will avoid systematically sampling the 
correlogram. It is obvious that in most cases either method will be fairly good 
but the use of this latter will usually be the more accurate. Comparisons are 
made, in table 2 for various types of correlogram using the samples indicated 
in table 1. It is, of course, possible to postulate theoretically many kinds of 


* Throughout this paper {is used for the differential sign to prevent confusion with d. 



358 


M. H. qUBNOUlLLB 


correlogram for which the equal-spaced sets of systematic samples will break 
down, but ultimately we must decide with reference to the types of correlogram 

TABLE 1 


Frequency of occurrence of the serial corrdalions pi, pi ... pu in the estimate of 
variance when 4 systematic samples each with spacing 16 units are taken 


p 

4 evenly- 
spaced 
83 rstematic 
samples 

4 systematic samples with random starting 
points at 

Total fre¬ 
quencies 

4. 7, 

8, 12, 

3, 7, 

8,12, 

3, 6, 
10, 13, 

4, 6, 

7, 14, 

2, 8, 
11,15, 

2, 6, 
11,16 

1, 15 


1 

1 


1 



3 

2, 14 





1 


1 

2 

3, 13 


1 


2 

1 

2 


6 

4, 12 

4 

2 

2 

1 


1 

1 

7 

5, 11 


1 

2 




2 

5 

6, 10 




1 

1 

1 

1 ' 

4 

7,9 



1 

2 

1 

2 

1 

7 

8 

4 

2 



2 



4 


TABLE 2 

IS 

Values of tu 2 pu as estimated by systematic samples 

tt—i 


Pu 

Evenly- 
spocra 
systematic 
samples 

Systematic samples with random starting points 

Mean 

Ex¬ 

pected 

1 

2 

3 

4 

5 

6 

‘ 1-0.2 u, (u - 1, .5) 

0.17 

0.27 

0.20 

0.17 

0.30 

0.17 

0.13 

0.21 

0.27 

1-0.1 u, (w « 1, .10) 

0.53 

0.62 

0.58 

0.53 

0.60 

0.53 

0.53 

0.57 

0.60 

2~» 

0.04 

0.13 

0.12 

0.06 

0.15 

0.06 

0.07 

0.10 

0.13 

2-U/4 

0.58 

0.66 

0.64 

0.60 

0.66 

0.60 

0.60 

0.63 

0.65 

Kendall’s Series 1 

-0.14 

0.03 

0.00 

-0.05 

0.16 

-0.05 

d 

1 

0.01 

0.07 


Naturally the use of this method of estimating the sampling error assumes that the 
correlation between the corresponding elements in each part or block into which the series 
is split may be neglected, i.e. in this case that the terms pie and above are negligible. In 


this case pu 1/16 and consequently the term2(TV ^ Pu — If ^ Pieu) 0.56^ required in 


(6) differs slightly from the term fs X pu » 0.65 which we are attempting to estimate. 

t*-i 


experienced. We shall consider this point further, after we have dealt with 
two-dimensional sampling. 

4 . Methods of sampling in 2 dimensions. The number of ways in which we 
can sample a two-dimensional space’ is large, since we can employ random. 


* We shall, in general, consider our two-dimensional space to be rectangular, but it is 
not difficult to draw similar conclusions for an area of any shape. 







PROBLEMS IN PLANE SAMPUNG 


359 


stratified random or ^stematic sampling in either direction. Thus we will be 
able to consider eveiy possible combination of these methods, e.g. random in 





Fig. 1. Graphical comparison of the efficiencies of systematic and stratified random 
sampling for various correlation functions. The thick line gives the function 

fi(u) ** upu/df 0 < u < d 

“ Py , d ^ U, 

and the dotted line the function 

AM » pid , (i — l)d < u < id. 

Thus systematic sampling is more or less efficient than stratified random sampling according 
to whether the area under the thick line is greater or less than the area under the dotted 
line. The most efficient method is indicated on each graph. 


one direction and ^stematic in another will be denoted by r-sy. Furthermore 
we can consider the sets of samples in one direction to be aligned with me 
another, or to be independently determined. The sufiSx 1 will be used to denote 












360 


M. H. QUENOUILLB 



aligned samples while suffix 0 will denote independent samples, e.g. we might 
sunple according to the system risyt. Examples of several methods of sampling 
are givenjin Figure 2. 









PROBLEMS IN PLANE SAMPLING 


361 


6. Accuracy of sampling in two dimensions. Suppose we consider a sample 
of nin 2 elements drawn from the elements Xij{i = 1,2, • • • niAii, j = 1,2, • • • UihJ, 
(which form a single finite population drawn from an infinite hypothetical 
population), such that the mean spacing in the two directions is h and ki. 
These parameters will, if necessary, be indicated in brackets after the method of 
sampling, e.g. nsyoiniki ; n 2 kt). 

Let X denote the mean of a sample formed by the method considered, and 
x' a member of this sample. Suppose, also, that the x^ are drawn from a popula¬ 
tion in which 

E(xii) = n, E(xii — m )* = 

m) m) “ t 

Further we may average pum over all possible values of t and j to define pu, =» 
p-u.-v by the relation 

2) PlJUK ~ (kl^l ~ I W “ I ** |)ptt»- 


The purpose of these definitions is to allow to eliminate the difficulties associated 
with the parameters of finite populations by considering this population as 
being itself a sample from an infinite population. Cochran employs a similar 
device. 

5a. Random sampling. It is not difficult to see that 

o\X) = h E(X, - X,f = E(Xi - pf - E{Xi - p) (Xt - p), 


where Xi and Xt are independent samples. 
Also 


E(X^ - p)iX 2 -p) = E(xi - p)(x,' - p) 

w'here the double summation* exists over the region S given by | u | < kini , 
I r I < fcjnj and excludes u = v = 0. We thus have to evaluate E{Xi — p)* for 
the different types of random sampling. 

It is easily shown that 


EiX, - pY = 


niHi 


1 + 


Win? — 1 


kikininiikikinini — 1 ) 


(^i«i - lw|)(^’i«2 



o 

niUi 




ni — 1 


kik^niikikiUin^ 


1) 

+ 


for ror^, 

23 23 (^i«i - I«l)(^n2 - I V j)p„ 

2(Th - 1) 




2^ (fcjnj — t;)pov j 


^ In general, unless otherwise stated, double summations will exist over the region for 
which the coefficients are positive, excluding u "> v «• 0. 




362 


M. B. QVENOX 7 I 1 XB 


+ r- ^Y ~ Y) 2 (*J”» “ ‘')P 0 . + ; ~ 2 (*1«1 - «)pi<ol 

kttitiktnt — 1) v^i kitiiikitii — 1) u-j J 


for nro, 

Z) 23 (kirh - I«|)(*jni - I V I)p«, 


for rifi, 


•whence 


• [" - k,h,n.Mhhn,n> - 1) rS fe». - I«!)(*.«. - I - l)P»] 

*(ri To) — ^ _ _——^ O’* F1 _ fci kj Tij 1 _ 

** ni«2\ kikij L (kih — l)kikinin2ikiktnint — 1) 


*23 13 (hni - I u DCAjnj - I V l)p„ 


2(nt - 
J\>2 7I^^A'2 ^2 


—tOpov 
'2 Ij 1 


a\ri ri) = f 1 - - —^ a fl_ ^ 2(^1 + ^2 ~ 1) - 1 

niW 2 \ kik 2 / L {kik 2 — 1)^1^2711^2(^1 feniTiQ — 1 ) 

•‘ff (t.™. - .W + ,, t _ -,)*S «•’.». -«)J. 

V"1 (A^l A»2 l/Tli^rCiTll T**! J 

5b. Stratified random sampling. We can deduce the variances for some methods 
of taking stratified samples if x'i, the mean of the elements sampled in the tth 
stratum, is independent of xy, since we will then have 

E{X - 2 fi = E{Xi - Xif/n, 

where i is the mean of the finite population which is sampled. Hence 
(sfo ro) = — ff*{ro ro(l, fci; nj fe)) 

ni 

(10) = J-fi _ 1 )ff*Fi - rr—aTT-Tx 

Til Tlj Y kx nJj/ L ^ 7lj(n/l AJj 71® 1) 

•23 23 (*^i - I«l)(^««4 - I V i)pwj. 



PROBLEMS IN PLANE SAMPLING 


363 


9 {shr^ mnjO kiktna{kikt — 1) 

(11) -S 2 (1^1 - I « !)(**«* - I V |)p«. 

{kih - l)n*(fc*n* - 1) 
ff®(8<o8to) = ^1 — <r* fl — • ir / T' i -n 

nini\ hh/ \_ kiki{kik2 — 1) 

(12) 

•Z E (*i - 1«l)(^* - I f l)p».]. 

To estimate the variance of other methods of sampling, we will make use of a 
general formula which we might have used to derive the expressions (8)-(12). 

If Xi is any element of the sample X, then 

(X - xY = — [Z (*.' - - Z - xY] 

71x712 

-— Fr w - a’ - E (x,' - ri’ 

Til 712 L ^2 

-I—^ z z ~ “ #*)*! > 

niTh J 


whence 

a\X) = E(X - xY 

_ kiki^iiTii — 
ki ito Tlx 712 


Z Z - l«l) 


^'2 ^2 ““ 1 2 r^ 1 V' V' I 

= — T—i -or 1 - y—j- jj— -- 2^ ^ (A:i ni — W ) 

Kx k2 Tlx 712 L '‘^1 ^2 Tlx 7l>\kx ko 71x712 — 1) 

(13) . ik,n, - 1 a l)pj - a' + Eix', - M - m) 

J Tli Tlx 7^2 

= — (l - r^) <r* Fl - .-j-i-j-E Z (^'i«i - I « I) 

niRs \ k \ k->J L kikiThUtikiki — 1 ) 

fu « I.. h. _i_ hhiniTii - 1) Eix'i - ii){x'j - m)1 

■ (»4n, - I H)P.. + - ^ -J• 

Thus, provided that we can estimate E{xi — it) (x,- — it)/a the expression (13) 
gives the error for all methods of sampling. 

As an example, we might deduce the expression (12). If we choose any member 
Xi , then a second member xj will be located at random with respect to x< except 
that there will be kik 2 — 1 positions in the same stratum as X{ that xj will not be 
able to occupy. Thus the expected correlation E(xi — it) (xy — M)/«r* will be 
given by 

)fcUlninj(nins - 1) ^ ^ ” I « l)(^'* «* “ I *’ Dp- 

“ klkKniTii - 1) 22 “ I “ D(*^* - 1 » Dp- • 



364 


M. H. QUSNOXniiLE 


If we substitute (14) into (13), we will obtain expression (12) for the variance of 
sksU . In the same manner, we can derive for stisk the expression 

E{xi - - It) 

_ _ . }■ - r ^ - 2 2 (^1 — \u \){k2nt — I V |)p«« 

- ~ Z Z (kmt - I tt 1)(A:2 - 1l)p„, 

hi h2 l^l 


(15) 


- Z Z (A-i -\umn,-\v I)p„. + iTT. Z Z (l^i -\u\) 


ki A’2 


// \ \\ \ 2(kik2ni 1) . 

(fc2 1 \)puv + - 1) 


2(fci fc2 — 1) ^ /, >1 2(^1 ^2 ^ 1) 

“ fci(ifci- 1) hi ^ hn^ihtH ~ 1) 


z 


p-i 


(kiUt — v)p(iv 


2(k\ki — 1) ^ 

h(h - 1) 



Thus we can evaluate a{X) for all types of stratified random sampling. 

5c. Systematic sampling. In a similar manner to that used for stratified random 
sampling, we can use (13) to evaluate the variances of systematic sampling. 
Values of E(Xi — p) (xj — p) for three of the possible methods of sampling are 
given below. For syisyi 

(16) E(,Xi “ t^(xj p) ~ n\nt(ji\ni _ 1) I |)(^*2 1 |)piiu,tj» 


For syin 


(17) 


E{Xi - p)(x,' p) - _ f) Z Z («i I n I) 


Aj n 2 


• (fe«2 - - 1) S 


For sytfiyo 


1 r 1 

E{,Xi - p)(x,- - p) = _ 1 ) ^ 2] ih ni - I M 1) 

• *" I V \)puv kik 2 fll ^ ^ P(^'2 1 V \)puv 

~ 21 (^1 - I« 1)(A-2«2 - 1|)pu. + 

• (h - |p I)p». + Z Z Oh - I « l)(A* - 1 p |)p*i«.. 


( 18 ) 




PROBLEMS IN PLANE SAMPLING 


365 


- ij (*. - t.)p» + A 2 £ (fe - I „ |)(;h _ I „ 

- ^ - u)pA 

Ki tt.l J 

The derivation of (18) may be oompared with that of (15). 

6. Effect of alignment. We can examine the effect of alignment either by 
an examination of the values of the variance of different samples, or by the 
direct use of (13).'J'or random and stratified random sampling, the effect of 
alignment is to increase the variance of the sample by an amount 

22 o„,(po, - Pup) + 2S 6tt,(p»o - Puv) where a„, > 0, 

bup > 0. 

This will be positive for monotonic decreasing correlation functions, and for the 
majority of functions realised in practice. Thus alignment will usually increase 
the variance'for random and stratified random samples. 

For systematic samples, the position is more complicated, but, roughly, the 
variance is increased by an amount 

2)2 Oup(pkiu,ttP ~ PkiUikfp)} 

where o„, > 0 and is a mean over a rectangle, centre for u and v 

non-zero, and is a mean over a line, length ki centre po, *,» for u zero, (and similarly 
for V zero) .^Whether this is positive or negative will depend on the correlation 
function, and it will have to be investigated for the types of correlation function 
which are encountered. 

7. Limiting forms. For a continuous process, when ni and n 2 are large, we 
may, in the same manner as for linear sampling, obtain integral approximations 
to the sampling variance, provided that 22 pd,«.<<,, converges. 

We thus have 


(19) 

<r*(ro To) = c^^stoT q) ^ ^ 


(20) 



(21) 


1- 

(22) 


IL """] 

(23) 

~ ^ ^ Li Li " 1 “ 1’'* " 1 

V |)pit« Su fipj 



366 


M. H. QTJENOUILLB 


<r* r 1 r* 

(24) 2 /*• 2 

• (dj — I V l)p«« 8u Sv ^ J puo 8u — ^ J — wpttO 

^ d2 fo ^ ^ fo "" 1 ^ 

~ ^ - a7, I. L '■• *“ *" 

(^®^ 2 0© -go 2 r** "I 

+ *.?.i."■■•'*'■*A 

^2 1*0000 *1 

(26) <r*(«j^isj/i) 2 2 Pd,«.<»,.-T-J f I Pu.«MiP I 

V——00 aiU2 •*-00 J~00 J 

A«.wJ ~;;^[l - ^ (* - hlWM." 

(27) + f f {di- l«|)(rfs - |t»|)p„,«t<«r 

(t\ Co2 J—dj J—di 

2 00 -d, 2 

+ ^2 / (<4 - I»|)pd,«,,5v - ;;jj / (<^1 - lvl)po,5t» 

u2 ti — i " flo J—dj ^2 J^df 

I ^ fidi 2 *1 

+ ^ 2 jL^ (<^1 “ 1«|)p«.<i**5« ~ ^ jLj “ l«l)puo5wJ. 

8 . Particular case where pu® = p«p» . We note that, if pu» = pup,' most of 
these forms can be simplified greatly. If we write 

2 r* * 

sy« = 1 — ^ y P»5 m + 2 23 pd,i*» 

2 

«t« =» 1 — ^ 2 (di — u)puSu, 

with similar forms for sy, and sU , and, also 

2 r* 2 r^* * 

/i “ ^ / p«5v, /i = ^ j[ ~ p)p.i«', /i* = 2 2 Pii», 

2 r* 2 // ** 

/j ^ / PuiUf ~ d* Jo ^2 “ 2 22 Pd|U) 

* A sufficient condition for this to be a valid autocorrelation function is that both p« 
and p« should be autocorrelation functions. 



PROBLEMS IN PLANE SAMPLING 


367 


then we have, for example, 


(28) 

(29) 

(30) 

(31) 

(32) 


<r\ri n) ~ (1 + fi), 


— (1 +/i +/»), 

Til 712 


<r*(s<o8<o) - (^StuStv “h Stu “h Stv\ 

111 712 


a (stiSti) ~ - (stuStv + + f 2 Btv), 

Til 712 


<r*(«yi«yi) ~- (ayuSyv +fi8yu +/*«!/.), 

Til 712 


(33) 

From these we get 


<r*(«yo m) — (8<» 8<. + f'l 8yu+ft syv). 

Til 712 


ffististi) - <r*(,syi8yi) ~ — 

(34) 


[(8<„S<. - sy««y.) +/i(s<u - sy„) +Mstv - sy,)], 


. , a (mm) ~ v*(8<o8to) — 

(36) «»”* 

• [{(1 - «yi.)(l - syv) - (1 - 8U(1 - 8<t.)) +fisyu +/i'syj, 


(36) <r*(s<o8<o) - ffisyosyo) - — [/((8<„ - syj + AisU - sy,)]. 

Til 712 


The forms (34), (35) and (36) enable us to compare the variances of the samples 
in two dimensions by using the one-dimensional results. For most practical 
cases, we know that the /’s are positive, stu > syu and sU > syv, so that 

(37) ff*(8<i8<i) > <r*(8J/i8j/i) > aistosto) > ffisytsyo). 

The values of ff*(sto8to)/<r\r(tro), (T*(syisyi)/a*{r^o), v*(«yo«yo)/<r*(roro) and 
cr^(,8to8to)/a^(8yosyo) for a«i« = pl“' and pj,, = pi*' are given in table 3. It is not 
difficult to show that for a given number of samples, (di, dt fixed), (r*(s<os<o), 
ff(jsyisyi) and <7^(8yo8yo) are least when m = pi - The expressions tabulated have a 
value of 1 for Pi = p 2 = 0 and tend to limiting values of 0, 2/3,0, and 2 respec¬ 
tively as Pi and pt tend to 1. It is interesting to note that for pi and pt differing 
by more than 0.4 the grid imposed by 8yi8yi is less efficient than purely random 
sampling. The type of function p«« = p«p,* is, however, less likely to be realised 


* For a town survey, we might find the correlation between two points depending on a 
within-streets and a between-streets chelation, so that this function could be realised. 




0.609 0.666 0.629 0.497 0.469 0.443 0.419 0.396 0.376 

0.706 0.721 0.788 0.914 1.134 1.632 2.362 4.911 <« 

0.462 0.416 0.380 0.362 0.328 0.307 0.289 0.272 0.267 

1.32 1.36 1.39 1.41 1.43 1.44 1.46 1.46 1.46 


0.616 0.476 0.441 
0.689 0.707 0.778 
0.365 0.327 0.297 
1.41 1.45 1.49 


0.394 0.360 0.329 0.300 0.272 0.247 

0.702 0.787 0.983 1.437 2.900 « 

0.266 0.229 0.206 0.185 0.167 0.161 

1.64 1.67 1.60 1.62 1.63 1.64 























































PROBLEMS IN PLANE SAMPLING 


369 


in practice than a centrally-synunetric function, which is independent of the 
choice of axes. For this reason, we consider next this latter type of function. 


9. Centrally*synunetric correlation functions. Dedebant and Wehrte [3] 
have given a necessary and sufficient condition for p(u, v) to be a correlation 
function as 


(38) 


p(«, v) = f f cos («M — p,)6F(«, p). 


or alternatively, 

1 f" f* 

(39) /(«, p) = cos (am — p,)p(m, v) Su Sv. 


For a centrally-symmetric correlation function we can put u — r cos tf, t> = r sin 
then p(«, v) = p(r) and 


/(a», p) = cos (r \/w* + cos di)p(r)r dOi dr, 

where 0i = 6 + tan“‘(p/a»), 

1 f* 

= K- I Jo(rT)p(r)r dr, where r = \/w* + p*. 

astt Jo 


Thus, if p(m, a) is centrally-systematic, then so is f(u, p) and conversely, so that 
we get 

(40) /(t) = ^ .l'o(rT)p(r)rar, 
and 

(41) p(r) = 2ir f Jit{rT)f{7)7^7. 


We can thus find suitable forms for p(r) and/(T). In this connection the formula 
I Jt(yz)e~^dy — 1/(0* + o > 0 , is useful, since we can see that ^(«~“Vy) 

and ^(o* + a*)”*^* are possible functions for 2jr/(T) and p(r) although our choice 

must be limited by the stochastic nature of p(r) as well as by its convergence. 
Thus, for example, o = n = 0 gives 1/2 tt and 1/r as spectral and correlation 
functions, but these will not converge. 

In the linear case, the Markoff process p(u) = e~““ had a spectral function 
/(t) = l/ir(o* + T*) which is a Cauchy distribution in one dimension. If we take a 
two-dimensional Cauchy distribution^ as our spectral function we get /(t) = 


’ In the same way as the ordinary Cauchy distribution can be considered as a density 
distribution on a line produced by a point source at a distance a, radiating in all directions, 
so can a two-dimensional distribution be considered as a density distribution on a plane 
from a source at distance a. 



370 


M. H. QXTENOmUil! 


o/2ir(o* + and p(r) ■« — ;r (e"*7>') “ Thus it appears that a generalised 

Od 

Cauchy distribution will be the spectral function for a generalised Markoff 
process. 

We can, of course, consider an ’’elliptical” Markoff process given by* 
i»n\ f \ r«* 2mu» , 

but, in what follows, to simplify the computation, m will be taken as zero, so 
that by changing the units in which di and d» are measured, we will work with a 
process p(r) = c"*''. 


TABLE 4 

Companion of observed serial correlations mth theoretical values obtained from a 
centraUy-symmetrie correlation function 


DintATIftA 

Rows 

Columns 

North-east 

South-east 

in miles 

Ob- 

Calcu- 

Ob- 

Calcu- 

Ob- 

Calcu- 

Ob- 

Calcu- 


served 

lated 

served 

lated 

served 

lated 

served 

lated 

1 

0.332 

0.368 

0.310 

0.368 


— 

— 

— 

2 

— 

— 

— 

— 



0.264 

0.243 

2 

0.149 

0.135 

0.090 

0.135 

— 

— 

— 

— 

2y/2 

— 

— 

— 

— 



0.129 

0.059 

3 

0.009 

0.050 

-0.029 

0.060 

— 

— 

— 

— 

3V2 

— 

— 

— 

— 

-0.050 




4 

0.034 

0.018 

-0.041 

0.018 

— 

— 

— 

— 

4V5 

— 

— 

— 

— 

-0.020 





This process does not seem to be far removed from the type of correlation 
function experienced in agricultural field work.* Osborne [4] has mentioned 
the possible use of p. = e~^. Mahalanobis [5] has calculated correlations for a- 
paddy field of 800 cells; his values are shown in table 4, together with values of 
the function e~’'. Bearing in mind that the standard error of each of Mahalanobis’ 
values is approximately 0.035, the fit is seen to be quite good, although an 
elliptical process with axes running south-east and north-east would undoubtedly 
fit the observations better. 


* In this light, /)(r) •• e~*' will be called the circular Markoff process, while />«« — 
and pii« -> exp ] ^ g r known as degenerate Markoff processes of the first and 


second orders. 

* This is further supported by the fact that using a function of this kind it is possible to 
obtain numerically a law in substantial agreement with Fairfield-Smith’s law over a wide 
range of values. 






























PROBLEMS IN PLANE SAMPLING 


371 


10. The relatiTe efficiencies of systematic and stratified random sampling. 
Ideally the correlation fimctions developed in the last section should be used 
in the expression (19)-(27), but these functions are not capable of easy integra¬ 
tion. An alternative approach can be made if we note that 


(43) 


a(sio8U) — g (syoSj/o) ^ (i 

<r*(roro) ^ di Ldi\ 


where 



+ 




F(u,d,) = I 


L*'0 02 Jdt 


Ptiv "" d% 


puv^u — di 



It is seen that F(w, (fe) and F{v, di) are extensions of the expressions obtained for 
(<r!t — <rJv)/<rJ in section 2. Hence, if F{u, di) and F(v, di) are both positive 
functions, systematic sampling is more accurate than stratified random sampling. 
A particular case of this occurs when p„» = pipl. However when pu, = exp 
{— (m* 4- v*)*^*}, F(u, di) is not always positive, since, as u increases, puv becomes a 
convex function of v. This complicates the interpretation of (43) greatly since it 
appears that as u varies from 0 to di, F(u, di) varies from -t- « to an unknown 
value X. This value will be positive if d 2 > > di and negative if di > > ds so 
that if the sampling is disproportionate in the two directions systematic sampling 
will be more efficient than stratified random sampling. Furthermore, if di = d* =d 
and d —» 0, F(u, d) —♦ w and. systematic sampling again appears to be more 
efficient. Thus in a wide variety of cases this t 3 T)e of systematic sampling i.e. 
syosyo gives a more accurate result than random sampling. 


11. Estimation of sampling errors. An examination of formulas (7)-(18) 
shows that the principles used for the estimation of linear errors can be used in 
plane sampling. If we consider that each sample can be broken up into inde¬ 
pendent units each of which is situated in one of s strata, then for q replications 
we will have qr — s degrees of freedom for error. For example, foro, ron , s/oro 
and 8<ori will have qriint — l,qni — I, qnifii — n\ and qn^ — \ degrees of freedom 
respectively, so that a single sample will contain an unbiased estimate of error, 
but «to8<b, siosti, stisti , sj/osj/o and apisyi will have nini(q — 1), ni(q — 1), q — 1 
and 9 — 1 degrees of freedom and will require replication to form a valid estimate 
of error. We can however use the method of splitting our sample into several 
parts each of which will give a fairly accurate estimate of error. We may, again, 
consider the possibility of using a set of systematic samples, which are evenly 
spaced, to estimate the sampling error, and we will see that the exclusion of the 
p’s of lower order may lead to appreciable bias unless the correlation between 



372 


M. H. QVBNOVUiLB 


successive terms of the sample is smaO, but, as Yates has pointed out, this 
method will provide an upper limit for our sampling error. These methods of 
sampling are illustrated by the examples given below. 

12. Examples. We shall consider the three methods of estimating the sampling 
errors of a systematic sample: 

(1) using sets of ^stematic samples randomly placed with respect to each 
other, i.e. the material to be sampled is broken up into a series of sub-areas 
or blocks and several systematic samples are taken in each block; the 
error variance is calculated from the variances of the systematic samples 
in each block, 

(2) using one set of ^tematic samples randomly placed, i.e. several sys¬ 
tematic samples are taken and the area is then broken up into sub-areas 
or blocks; the error variance is calculated from the variances of the 
portions of the systematic samples in each block, 

(3) using one systematic sample i.e. one systematic sample is taken which is 
broken into several systematic samples of wider spacing, e.g. four samples 
at four times the original spacing, the area is then divided into several 
sub-areas and the error variance is calculated from the variances of the 
portions of the sub-systematic samples in each block. 

These three methods are increasingly accurate in their estimation of the 
mean, increasingly biased in their estimation of the sampling variance, and 
decreasingly difficult in their practical application, so that our method of sam¬ 
pling may vary according to the population and according to the use to which the 
results are to be put. It is, for example, conceivable that subsequent sampling 
will yield an improved estimate of error so that initially only a rough guide 
may be required. 

a. If we are sampling from a continuous linear population with a large number 
of observations in each part into which we split our series, methods (1) and (2) 
will both give accurate estimates of the variance per term 

<r* ^1 ~ ^ Pu5« -f- 2 

Method (3) will, however, estimate tr* instead of the correct variance per term, 
which is 

V* ^1 “ ^ pu^tt 2 2 P<»u/«^ . 

Thus the estimates of sampling variance by method (3) will in general be higher 
than the estimates by methods (1) and (2), although the actual variance will be 
lower. 

b. Kendall [6, 7] has constructed 480 terms of an artificial series Un+i » 

1.1 — 0.6 Mn «„+» where the cn are rectangularly distributed from —49 

to 49. For this series c* * 2379.81 and s’ « 2535.11. The series was split in six 
parts of 80 terms, for each of which n = 5, A; » 16, $ » 4, so that 18 degrees of 
freedom were available for error. The results for this sampling configuration are 




PBOBLEMS IN PLANE SAMPLING 


373 


given in table 5. The values in this table corroborate the conclusions for large 
samples of continuous populations. 

c. A number of uniformity trials were taken and sampled according to the 
systems atisti and syisyi. For sampling according to the system atisti the error 

TABLE 5 


Compariaon of three methoda of eatimating the aampUng error of ayatematie aamplea 

for an, autoregreaaive acheme 


Method 

Estimate of sampling 
variance per term, 8*, 
based on 18 degrees of 
freedom 


True sampling 
variance per term 

(1) 

3228 


2170 

(2) 

1872 


2167 

(3) 

3709 


423 


TABLE 6 


Comparison of efficiencies of different methods of sampling on three uniformity trials 


Source. 

No. In Cochran’s 

(11] Catalogue. 

Crop. 

No. of Plots. 

Mean. 

Variance per term. 

Kalamkar [8] 

72 

Potatoes 

576 

23.262 

15.555 

Wiebe [9] 

132 

Wheat 

1440 

587.95 

10,018.0* 

Wynne Sa 
lyez [10 

^es and Karishna, 

108 

Sugar cane 

960 

270.89 

1794.42 

Type of sampling.. 

Sti Sti 

syi syi 

syi syi 

Sti Sti 

syi syi 

syi syi 

sti Sti 

syi syi 

syi syi 

Proportion sam- 










pled. 

1/6 

1/6 

1/6 

1/9 

1/9 

1/9 

1/8 

1/8 

1/8 

Method of estimat- 










ing error. 


(2) 

(3) 


(2) 

(3) 


(2) 

(3) 

No. of partitions... 

1 

4 

4 

1 

4 

4 

1 

5 

5 

ni. 

3 

3 

6 1 

4 

2 

4 

4 

2 

4 

h . 

2 

2 

1 

3 

6 

3 

2 

4 

2 

nt . 

16 

2 

4 

20 

5 

10 

15 

3 

6 

h . 

6 

12 

6 

6 

6 

3 

8 

8 

4 

Q . 

2 

4 

1 

2 

4 

1 

2 

4 

1 

Mean. 

23.140 

23.435 

23.323 

586.54 

598.65 

275.29 

275.29 

266.72 

271.27 

Estimated variance 










per term. 

9.763 

2.689 

4.889 

5151.6 

5772.7 

7038.5 

1320.15 

799.29 

1269.54 

Degrees of freedom 










of estimated var¬ 










iance . 

48 

12 

12 

80 

12 

12 

60 

15 

15 


* Based on the original 1500 plots. 


was estimated by taking two samples per strata, while, for sampling according 
to the ^stem ayiayt , the error was estimated by comparing sets of four samples 
in each part of the series by methods (2) and (3). The results of this sampling are 
shown in table 6. While the number of trials is small, the trend to be seen in the 
results agrees very well with the conclusions reached above. 






















374 


M. H. QUENOUILIiE 


13. Trend in the population. Frequently in taking samples from a population, 
we are faced with the problem of a trend. This will not greatly affect random and 
stratified random samples as estimates of the population mean, but the efficiency 
of systematic samples will be affected to a large extent. If we consider linear 
sampling, and denote by Si the sample whose first element is a:,- then the set of 
samples Si will usually be monotonic with i and the difference between Si and 
St will be large (roughly equal to *1 — xt). 

Yates [1] has suggested a method to overcome this difficulty; by letting S< 
represent 

if*! I I . k — i 1 

^ i hfc + ‘ “ + («-2)* T-a:n+(n-l)» I, 


the difference between systematic samples due to trend is largely removed. 
It is easily seen that this necessitates a small loss of information, and in particular, 
for a continuous random population the variance is (» — |)<rV(» — 1)* instead 
of a/n. For plane samples, the corresponding adjusted sample will be 


Sii 


(ni - l)(n4 


_ ij 

— 1 ) \_kiki 


^ + • • • + 


j(h - i) 

ki kt 




+ + Xnjci,i+k» + • • • + 


ikx - i) 

ki 




H-I-T- + * * • H-(nj-l)*i ,j+(n. 


kiki 


i»2 


-DfcjJ 


with a similar loss of information. 

Trend is, however, most likely to be appreciable in large samples, and in this 
case, the loss of information due to end adjustments is negligible, so that the 
conclusions reached above will remain unaltered. 

The author wishes to thank Dr. F. Yates and Professor M. S. Bartlett for 
advice in the preparation of this paper. 


REFERENCES 

[1] W. G. CocHBAN, “The relative accuracy of systematic and stratified random samples 

for a certain class of populations,” Annals of Math. Stat., Vol. 17 (1946), p. 164. 

[2] F. Yates, “A review of recent statistical developements in sampling and sampling 

surveys,” Roy. Stat. Soc. Jour., Vol. 109 (19^), p. 12. 

[3] G. Dedbbamt and P. Wehbte, “Mecanique aliatoir,” Portugaliae Physics, Vol. 1 

(1946). 

[4] J. G. OsBOBNB, “Sampling errors of systematic and random surveys of cover-type 

areas,” Am. Stat. Asso. Jour., Vol. 37 Gl^)> P- 
[6] P. C. Mahalanobis, “On large-scale sample surveys,” Roy. Soc. Phil. Trans., B. 231 
(1944), p. 329. 

[6] M. G. Kendall, “On the analysis of oscillatory time-series,” Roy. Stat. Soc. Jour., 
Vol. 108 (1946), p. 93. 

(71 M. G. Kendall, Contributions to the Study of Oscillatory Time-Series, Nat. Inst. Econ. 
Soc. Res., 1946. 



PROBLEMS IN PLANE SAMPLING 


375 


[8] R. J. Kalabikab, Experimental errors and the field-plot technique with potatoes/’ 

Jour, Agr, 8ci., 1932, p. 373. 

[9] G. A. WiBBE, “Variation and correlation in grain-yield among 1500 wheat nursery 

plots,” Jour, Agr, Res., 1936. 

[10] Wynne Satbr and P. V. Kbishna Itbb, “On some of the factors that influence the 
error of field experiments with special reference to sugarcane,” Ind, Jour, Agr, 
Sci., 1936, p. 917. 

[Ill W. G. CocHBAN, “Catalogue of uniformity trial data,” Roy, Stat, Soc, Suppl, Jour,^ 
Vol. 4 (1937), p. 233. 

[12] F. Yates, “Systematic sampling,” Roy, Soc, Phil, Trane,, Vol. 241 (1948), p. 346. 



REPRESENTATION OF PROBABILITY DISTRIBUTIONS BY 
CHARLIER SERIES* 

Bt R. P. Boas, Jr. 

Brown UniversUy 

Summaiy. The paper describes some results concerning the representation 
of a function by linear combinations of the successive differences of the Poisson 
distribution, not necessarily the partial sums of the type B series of Charlier. 

1, Introduction. For various purposes it is often desired to expand a probability 
distributicm f(x) in a series 

00 

(1) f(x) 23 c* 0k(x), 

k-t 

where the Ok(x) are a given set of standard functions. Arguments of a heuristic 
nature led Charlier [4, 5, 6] to suggest that it would be useful to take the 9k(x) 
in (1) to be either the successive derivatives or the successive differences of some 
fixed fimction; the two cases are often referred to as type A series and type B 
series, respectively. Charlier gave formulas for determining the coefficients in the 
two cases, but the question of whether the formal series represents the given 
function in any reasonable sense has to be investigated separately for each 
particular choice of the function generating the series. Only one special case of 
each type has been much used: for the A-series, 0o(a!) is the normal density 
function (2ir)“*e“***; for the B-series, Ooix) is the Poisson function ! (when x 

is restricted to take only nonnegative integral values). We shall refer only to 
these special cases when we speak of A- and B-series in this paper. 

* There are two distinct problems (which have, however, often been confused) 
connected with the representation of a function /(x) by a series (1); for con¬ 
venience, we shall refer to them in this paper as the practical problem and the 
theoretical problem. In the practiced problem, we have an empirical function /(x), 
defined only for a finite number of values of x, which we suspect is representable 
by co6o(x) together with a small correction, so that we hope that a few (say three 
or four) terms of (1) may give a good representation of /(x) in a relatively simple 
anal 3 diical form with a reasonable amount of computational labor. In some cases, 
and certainly with the classical A- and B-series which we are considering, we 
could represent, as closely as desired, any /(x) (however irregular) which takes 
nonzero values at only a finite number of points; but there is no interest in doing 
this if the process involves finding too many terms of the series. (Neglect of this 
fact has led to ill-founded statements by mathematicians about the satisfactory 
nature of the A- or B-series; but see [27, pp. 38-39].) 

Thus it would be of interest to know, if possible, under what circumstances a 
given empirical density can be represented fairly well by a few terms of a series 
of a ^ven kind. If no simple criterion can be given, it is desirable to have a means 

* Address delivered by invitation at the meeting of the Institute at Boulder, Colorado, 
on September 1,1949. 


876 



REPBESENTATIONS OF DISTRIBUTIONS 


377 


of computing coefficients which will make a few terms of (1) give the best possible 
fit—^best possible being defined in a way appropriate for the problem at hand. 

In the theoretical jirciblem, f{x) is a function defined for all values of x, or at least 
for all of an infinite set of equally spaced values of x, arising from theoretical 
considerations which suggest Co0o{x) as a reasonable first approximation to f(x). 
For example, the central limit theorem states that under certain conditions the 
cumulative distribution function of the sum of a large number of independent 
random variables is approximately normal; then we might expect that this 
distribution function would be representable by a series (1) with do(x) the normal 
distribution function. For such theoretical purposes we should like to have 
criteria for the representability of a sufficiently general f(x) by a series (1), 
where representability is of course to be interpreted appropriately, as ordinary 
convergence, uniform convergence, convergence in mean square, asymptotic 
representation, etc., according to the requirements of the problem at hand. The 
larger the class of f(x) for which we can prove a representation theorem, the 
larger is the possible domain of applicability of the series to theoretical problems. 

2. The A-series. This paper is concerned with the ^-series, but for comparison 
we first mention some properties of the A-series. In the case of the classical 
A-series, we have the attractive fact that the fimctions 0n(x) are orthogonal 
with weight function c***, that is, ^ 

0n(x)0p,(x)e*** dx = 0, m n. 

00 

In fact, is, except for a numerical factor, the nth Hermite polynomial. 

This orthogonality property enables one to compute the coefficients in a series (1) 
with great ease from 

(2) n! c, = /* /(x)e«(x)e*** dx, 

or since tf„(x)6*** is a polynomial, from the moments of fix). By the classical 
theory of orthogonal functions, this means that if the Cn are so computed, and we 
take iV -f- 1 terms of the series, we minimize 

(3) f"e**’[/(x) -F;,(x)]*dx 

rf—00 

for all possible sums 

(4) Firix) « 2 CnOnix). 

ti-O 

The convergence theoiy of Hermite series has been thorou^y investigated by 
mathematicians, so that it would appear that in theoretical problems, in which 
fix) is given for all values of x, we are in a position to find out eveiything about 
the representation of fix) by an A-series. Also in problems of practical curve- 



378 


B. P. BOAS, JB. 


fitting, the fact that the closest approximation to/(x) (in the sense (3)) by sums 
of the form (4) is given by choosing the coefficients according to (2) seems to 
leave no more to be said. 

However, the formal elegance of the -series seems to be somewhat misleading. 
Even when a series converges it by no means follows that its ATth partial sum is 
the best selection of N terms for representing a given function. Even though the 
partial sums do give the best fit in the sense of (3), it may not be desirable to 
measure the closeness of approximation by (3); some other measure of approxi¬ 
mation may be better suited to the end in view. For example, it is known that 
the partial sums of Edgeworth’s series (see [8]), which is a rearrangement of the 
d.-series, are more satisfactory for some purposes than the partial sums of the 
4-series with the coefficients determined by (2). More precisely, Edgeworth’s 
series furnishes an as 3 anptotic expansion, with a remainder term whose order of 
magnitude can be estimated quite precisely, in circumstances where the series of 
orthogonal functions does not do this. Again, for practical purposes a few terms 
of the 4-series sometimes exhibit undesirable properties (such as negative 
frequencies). K f(x) is a function defined only for integral values of x, A. Fisher 
[10] has suggested and applied the idea of minimizing, not (3), but the sum 
53^ l/(®) ~ F«(a:)j*in order to determine the coefficients of the approximating 
sums. 


8. The B'series. We can now see how the status of the B-series resembles or 
differs from that of the 4-8eries. Here we deal principally with a function defined 
for integral values of x-, $o(x) = 0{x) = e"*XV»!, ^6(,x) = 6{x) — 0(x — 1), 
4*5(3:) = A(A*~*5(x)) and 0t(x) = 4*5(3:); 5(3:) is taken to be 0 for negative 
, integral x. We shall refer to this as the discrete case of the B-series. The liter¬ 
ature of the subject contains a number of rather painful attempts to put the co¬ 
efficients into usable form, persisting even after the simple formula 

(5) (l/n!)E (;")(-1 )'X’’"'m. 

bad been obtained, where is the nth factorial moment, 

Mn = Z mwik - n)!. 

Formula (5) can be derived, for example, by using orthogonality properties of 
the 5r(x). We have, in fact, that 5„(x)5m(x)/5o(x) is 0 or n! X““ according as 
n 9 ^ morn — m. 

The parameter X in the B-series is at our' disposal, and can for example be 
chosen in such a way as to improve the convei^ence of the series. For purposes of 
practical curve-fitting, it has been customary to choose X equal to the mean of 
the distribution/(x), a choice which makes the coefficient ci of 45 equal to zero. 
Charlier also suggested other methods in which ci and cj, or ci, ci and ci are 
zero [7]. Such choices, of course, may reduce the amount of computation needed 



REPRESBNTATIONS OP mSTSIBtmONS 


379 


to make use of a given number of differences in fitting a curve; aside from this 
consideration their use seems to depend on the belief that one improves the 
convergence of a series by adjusting any available parameters so that as many as 
possible of the initial terms of the series are zero. This belief does not always 
seem to be confirmed by the facts. (In particular, compare columns 2 and 5 of 
Table 1, columns 2 and 4 of Table 2, or columns 2 and 4 of Table 3.) 

The theoretical problem of what f(x) can be represented by convergent 
B-series has been studied by several authors [12,13,17,19, 20,21,23,24,26,28]; 
the study by Schmidt [24; see also 25 and 17] gives necessary and sufficient 
conditions for the representation in the case of a nonnegative f(x), so that, at 
least in all cases of interest in statistics, the theoretical problem seems to be 
completely solved. However, one of the pmT)ose3 of the present paper is to 
reopen this apparently closed problem. 

There is also a ctmMnuous version of the ^-series, which is suggested by the 
fact that 

(6) eix) = £ c-**“ exp (Xe*“) du 

reduces to the Poisson function e^xyxl for positive integral x (and to 0 for 
negative integral x). This form of the P-series has not been much used, and its 
use is subject to suspicion since it has rather peculiar properties. In particular, 
it cannot represent, in any reasonable sense, a positive function/(x) or one which 
is too small as x —» <» [26, 3]; since the functions which present themselves for 
representation in practice are both positive and small at infinity, the continuous 
case of the H-series looks unpromising for applications. (See also [27a], la.) How¬ 
ever, it has been applied [15]. 

The purpose of this paper is to describe some results on the J?-series which 
have been obtained in a mathematical paper [3], devoted to what we have 
called the theoretical problem; some contributions to the practical problem 
will also be given in the present paper. The starting point of this investigation 
was the question of what happens if one tries to approximate a function, not 
by the partial sums of the series (1), but by some other combination of the 
first N functions S„(x), when approximation is taken in the sense of (unweighted) 
least-squares. This method of approximation seems well adapted to statistical 
problems, and leads to simpler mathematical work than ordinary point-by-point 
convergence of the partial sums. The H-series itself gives a least squares approxi¬ 
mation with a weight function l/0o(x). We consider here only the classical H-series, 
when Oo(x) = (?(x) = e~V/x!, tf„(x) = the main results are substantially 

the same for rather more general cases [3; see also 14, 25]. In addition, here we 
consider only nonnegative /(x), assumed zero for negative x. Functions which 
need not be zero for negative x are handled easily by generalizing the H-series 
to the form [3] 

(7) /(x) ~ E V" 9(x) -1- E a. A* fl(x). 



380 


R. P. BOAS, JR. 


where V denotes the advancing difference: Vfi(x) 9(x) — fi(x + 1); there 
seems to be no particular reason (other than a historical one) for preferring one 
kind of difference to the other. The generalized series (7) might be useful for 
graduating symmetrical probability distributions, although it does not seem to 
have been considered in the literature (cf. [la]). 

4. Results: practical problem. Our question takes somewhat different forms 
in the two cases which we have described as the practical and the theoretical. 
In the former, we ask what the coefficients should be so that 

(8) £ /(x) - 2 ai*'^A*6(x) 

0 Jb»0 

shall be a minimum, where f(x) is an empirically given function and iV is a given 
integer, in general not very large. If iV^ is 0, 1 or 2, that is, if we use 1, 2 or 3 
terms, the best choice of the in (8) can be calculated without difficulty. 

For N = 0, our question is that of finding the best least-squares fit to f(x) 
by a Poisson distribution the best choice of oo®’ is then 

(9) af = / Jo(2i\), 

where 

Jody) = 1 + pV(2I)* + y*/(4iy H- • • • 

(Jo denotes the Bessel function of order 0); on the other hand, the usual formula 
(5) gives the different coefficient 

00 

Co = Mt = /(*). 

»-K) 

This, of course, is simpler than (9) to compute, although its use is based on the 
uncritical assumption that the first term of the series (1) is the best one to take 
if only one term is to be used. Charlier [7; see also 10, pp. 101-103] suggested a 
different formula in which one uses, not A*^(x), but A*0(px -f g), the parameters 
p, g, X being adjusted to make the terms of (1) in Aff, A®0 all zero; here ff(x) 
is defined when x is not an integer by interpreting e"*X*/*I as e'^X*/r(® -|- 1), 
and not by using formula (6). Table 2 shows that in at least one numerical case 
(9) gives a better least-squares fit than Charlier’s method (and without intro¬ 
ducing gamma functions to take care of ff(x) for fractional x). However, it is 
not mccluded that Charlier’s method will give better results in other cases, 
since with the change of the functions ff„(x) the results of this paper cease to 
apply. 

For AT = 1, we get the best least-squares approximation to f(x) by 

ao^ff(x) + oi‘’Atf(*) 



REPRESENTATIONS OF DISTRIBUTIONS 


381 


if 


( 10 ) 




—jTjg (Ho + Hl)t 




where So == f(x)9(x), Si = S^/(®)^(® -!),«= Jo(2iX), /3 = 

—iJ 1 (2tX), the J’b again denoting Bessel functions. For N - 2, the corresponding 
formulas involve also 7 = —Jt(2tK) and S» = HZ-ofix)0(x — 2 ). They are: 




— a 


2 ^* 


_ „2 _ 


oty 


So + 


2/8 — a — 7 

2/3» - a» - 


ay 


Ei + 


P - a S 


2/3* — a* — cry 


it 


-a ( 2 ) _ jgy — 0/3 + 2/3^ — 2ay yi , « + T — 2)8 y 
^ (a — 'y)(2/3* — a* — ay) * 2^* — a* — ay * 


( 11 ) 


_j__ 2a — 2/5* + |8y — 0/3 y' 
f 2/3* - a* - ay ‘^*’ 


e-*^af = 


ay 


-/3* 


(a — y)(28* — a* — ay) 


So + 


/3 


2/3* — a* — ay 


El 


, V- 

(a — y)(2p* — a* — ay) *’ 


The functions i’'Jn(iy) are real for real y, and extensive tables are available [32]. 

Some numerical examples showing the comparison between graduation by 
these formulas and by the corresponding number of terms of the B-series are 
given in Tables 1-3. It will be noticed that (as the theory indicates) one gets a 
better least-squares fit by formulas (9), ( 10 ) or ( 11 ) than by a corresponding 
number of terms of the B-series using the coefficients (5). However, one may not 
get a better fit if goodness of fit is measured in some other way, e.g. by x- 
Unfortimately the coefficients calculated by this method increase rapidly in 
complexity as the number of terms increases, and even the coefficients for iV = 3 
would involve very heavy algebra. Since numerical examples [ 2 ] indicate that it 
is often necessary to go to terms in A *8 for a satisfactory fit, it might be worth 
while to calculate the next few coefficients. 


5. Results: theoretical problem. In the case of a theoretical distribution we 
ask how coefficients should be determined so that 

(12) E fix) - E A* 0ix) 

s-O Jk-0 

will tend to 0 as iV —> «>. The convergence to 0 of ( 12 ) is a rather strong kind of 
convergence, since it implies convergence of the approximating sums to f(x), 
not only for each x, but even uniformly for all x. Of course, the “best” choice of 



382 


B. P. BOAS, JB. 


ai^ as above would be expected to give convergence under the weakest hypothe¬ 
ses, but because of the complexity of these coefficients it seems desirable to 
make ( 12 ) only approximately a minimum; this actually makes no difference 
in the limit, ^though the approximation is not usually satisfactory for small 
values of N. To see the coimection between the formulas used here and the 
“classical” formula (5) for the coefficients in ( 1 ), we note that (5) can be written 

(13) a, = ±f(k) £ ; 

n\ t-o os” 


(5) results if we expand the derivative by Leibniz’s rule and rearrange the sum. 
If we expand in a power series before differentiating in (13), we obtain 


[-irtm t (|,V'(-x)'-/ii = ^(-iri;(OE<p^/w. 

Jb»0 Z*inax(fc,n)\^/ Z—““ K)\ 


If now we break this series off at n <= IV to obtain 


(14) 

we obtain a sequence of approximations to /(x) by sums which 

has, in general, much better convergence properties than the partial sums of the 
J5-series with coefficients o„ given by (5). In particular, if fix) = 0 for x = — 1, 
—2, • • • , this sequence of approximations converges to/(x) whenever SSh) |/(®) 1 * 
converges; on the other hand, for nonnegative /(x) it is known [24] that the 
B-series converges if and only if lim*_«,/(x)2*x* = 0 for fc = 0,1,2, • • • , a much 
more restrictive condition. If we demand that the partial sums of the B-series 
converge in mean square, that is, that (12) tends to zero witha*'*^^ independent of 
N, we have the even more restrictive condition [3] that lim sup«_» {/(x) }*'* ^ i. 

The approximating sums with coefficients (14) have the additional property 
that they reproduce /(x) exactly for x = 0, 1, 2, • • • , N. One would expect 
that in general they would then tend to deviate rather widely from /(x) for 
larger x, and so would not be satisfactory for practical curve-fitting. However, 
it seems possible that if we fit such a sum not to /(x), but to /(px + q), with 
suitable integers p and q, thus making the approximation agree with /(x) aUa 
set of values covering the whole range of definition of fix), it might give a satis¬ 
factory fit elsewhere. This possibility has not been investigated; a similar 
approach using the partial sums of the B-series was suggested by Charlier [7] 
and Fisher [10]. 

6 . The continuous case of the B-seiies. In the continuous case we again ask 
not when 


fix) - 2 o»A"fl(x) 

n-0 


( 15 ) 



REPBESENTATIONS OF DIBTRIBXTTIONS 


883 


with uniform convergence in every finite interval, but when 

(16) /(*) =» Lun. 

n-0 

which means that 

(17) lim f /(«) — X/ ^”9(3;) r dx = 0. 

JV-*«o J-oo ) n*0 I 

For (15) the following negative results are known [26]: if /(*) ^ 0, (15) cannot 
converge uniformly on every finite interval (unless f{x) = 0); the series, if 
convergent uniformly on every finite interval, cannot converge to /(*) unless 
the Fourier transform of f(x) vanishes outside (—ir, t), a condition which 

TABLE 1 


Number of petdU on buttercups. X = .631 


X 

1 

Observed 

frequency 

2 

Calculated 
3 terms 
(formula 

6) 

3 

Calculated 
1 term 
(formula 

9) 

4 

Calculated 
2 terms 
(formula 
10) 

5 

Calculated 
3 terms 
(formula 
11) 

6 

Calculated 

3 terms 
(formula 
14) 

5 

133 

134.9 

119.9 

130.6 

132.9 

133.0 

6 

55 

51.6 

75.6 

62.3 

55.3 

55.0 

7 

23 

22.5 

22.5 

13.3 

22.1 

23.0 

8 

7 

9.5 

5.0 

1.5 

8.5 

9.1 

9 

2 

2.9 

0.8 

0.0 

2.4 

2.6 

10 

2 

0.6 

0.1 

0.0 

0.5 

0.5 

Total. 

222 

222.0 

223.9 

207.7 

221.7 

223.2 


automatically excludes any f{x) which vanishes for all large | ® ] or even is too 
small as X —» 00. Nevertheless, J0rgensen [15] applies the continuous case success¬ 
fully to practical problems. A possible explanation of this apparent discrepancy 
is that if the in (16) are properly determined, (16) will be true imder fairly 
general conditions. To be sure, the mean square difference in (17) cannot be 
made arbitrarily small unless the Fourier transform g(x) of f(x) vanishes outside 
(—jr, v), but if |/(x) 1* is integrable the difference can be made small if g(x) is 
itself small. If g(,x) does vanish outside (—v, t), then (16) is true; and in fact 
the coefficients can be taken the same as in (14), so that the app|Kimating 
sums depend only on the values of /(x) for integral values of x; these*lues are 
known to determine /(x) under our hypotheses on g(x). 

7. Discussion of some numerical results. Table 1. Column 2 gives the fit by 
two terms of the B-series (really three, since the coefficient of A0 is zero when 








384 


B. P. BOAS, JB. 


fonnula (5) is used), as calculated by Charlier [7] (that is, using terms through 
A*9). Column 3 gives the best least-squares fit by a single term, i.e., a Poisson 
distribution, calculated by formula (9); it is clear that this term alone does not 
represent the observations very well. Column 4 gives the best least-squares 
fit by terms throu^ A6. Colunm 5 gives the best least-sqiiares fit by terms 
through the improvement over Charlier’s fit by the same number of terms 
is evident by inspection. Column 6 gives, for comparison, the same number 
of terms calculated by formula (14), which gives an approximation to the best 
least-squares fit and necessarily reproduces the data exactly for the first three 

TABLE 2 


Failure of grains of barley. X =» 2.757 


X 

1 

Observed 

frequency 

2 

Calculated 

4 terms 
(Charlier) 

3 

Calculated 

1 term 
(Formula 9) 

4 

Calculated 

2 terms 
(Formula 10) 

5 

Calculated 

3 terms 
(Formula 11) 

0 

53 

63 

47.3 

49.9 

48.4 

1 

131 

139 

130.4 

134.7 

133.4 

2 

180 

174 

179.8 

181.6 

182.3 

3 

170 

151 

165.3 

163.2 

164.3 

4 

111 

111 

113.9 

110.0 

109.8 

5 

50 

60 

62.7 

59.3 

58.1 

6 

22 

32 

28.8 

26.5 

25.2 

7 

22 

14 

11.4 

10.2 

9.3 

8 

7 

6 

3.9 

3.4 

2.9 

9 

2 

2 

1.1 

1.0 

0.8 

10 

1 

0 

0.3 

0.2 

0.2 

Total. 

749 

752 

744.9 

740.0 

734.7 


values of x. The fact that (14) gives good results here is presumably connected 
with the small size of X. 

Table 2. Column 2 gives the values calculated by Charlier [7] for a fit after 
the linear transformation x —*px + q, with X, p and q chosen to make the terms 
in A9, A*0 , a’0 all zero (the values were read to the nearest integer from Charlier’s 
graph). Column 3 gives the best least-squares single-term fit calculated by 
formula (9); this is a considerable improvement for x ^ 6, but for the remainder 
of the table it is rather poor. Column 4 gives the best least-squares fit by two 
terms; column 5, that by three. The x*-test indicates that the graduation is 
rather poor in all cases. 

Table 3. Column 2 gives the classical calculation with terms through A*0; 
this was given by A. Fisher [10] and (more accurately) by Aroian [2]. Columns 3 






REPRESENTATIONS OF DISTRIBUTIONS 


385 


and 4 give the best least-squares approximations by two and three terms; 
column 4 is better than column 2, in this sense, as expected. However, column 4 
is a poorer fit when tested by x*, chiefly because of the poor fit at a: = 0. It should 
be noted that two more terms of the J5-series give a more satisfactory fit [2]. 


TABLE 3 

a-particles from a bar of 'polonium. X = 3.87155 


X 

1 

Observed 

frequency 

2 

Calculated 

3 terms 
(formula 5) 

3 

Calculated 

2 terms 
(formula 10) 

4 

Calculated 

3 terms 
(formula 11) 

0 

57 

49.5 

51.3 

45.2 

1 

203 

201.3 

213.3 

190.9 

2 

383 

403.4 

399.0 

393.5 

3 

525 

532.3 

524.8 

529.8 

4 

532 

520.6 

517.2 

525.4 

5 

408 

402.6 

407.7 

409.7 

6 

273 

254.8 

267.7 

261.9 

7 

139 

137.1 

150.6 

141.1 

8 

45 

64.0 

74.1 

65.3 

9 

27 

26.1 

32.4 

26.3 

10 

10 

9.4 

12.8 

9.3 

11 

4 

3.0 

4.6 

2.9 

12 

0 

0.9 

1.5 

0.8 

13 

1 

0.2 

0.5 

0.2 

14 

1 

0.0 

0.1 

0.0 

Total. 

2608 

2605.2 

2657.6 

2602.3 




X* = 10.2 

X* = 16.2 

X* = 11.4 



w = 7 

n = 8 

n = 7 


8. Proofs: theoretical problem. We now outline the proofs of the results which 
we have stated. They depend on the fact that the numbers tf(x) (x = 0, ±1, 
±2, • • •) (where d{x) = 0 when x is a negative integer) are the Fourier coefficients 
of the function ip{u) = e~^ exp (Xe’”), i.e. 

9(x) = (2 t)"‘ f du, x = 0, ±1, ±2, ••• 


a‘<1(x) = (2rr' £ ^(u)(l - 


dv. 


Furthermore, 




386 


B. P. BOAS, JH. 


If we then assume the condition I /(») 1* < «i with f(x) = 0 for a; = 
— 1, —2, • • • , the numbers /(x) are the Fourier coefficients of a function gix) 
of integrable square, by the Riesz-Fischer theorem from the theory of Fourier 
series [31, p. 74]: 

fix) = ^ giu)e~'"‘ du, X = 0, ± 1, ± 2, • • •. 

Thus 

(18) fix) - i: ar’A*<»(x) 

<-0 

= (2ir)-^ j['e-‘*“ [j?(u) - ^(m) g a^il - du, 

and so the expressions on the left appear as the Fourier coefficients of the expres¬ 
sions in square brackets on the right. By Parseval’s theorem for Fourier series 
[31, p. 76], then, we have 

(19) i: fix) -tar£>!‘eix) ' 

ac—00 A-—0 

= (2,r)-»/‘'L(M) - v^(«) - c‘“)‘ ' du. 

•'-rl il-0 

Thus we have reduced the problem of minimizing the mean-square difference 
on the left of (19) to that of minimizing the integral on the right of (19). By 
rearranging the sum in the integrand, we see that an equivalent problem is to 
minimize 

(20) D = (2x)-* r I giu) - vM E cr ' du, 

J-T 1 Jfe-0 

where the and are readily expressed in terms of each other; in fact, 

C21) 

Since | ^(m) | = ^ e~^ > 0, we can write D in the form 

D = (2x)”‘ f I giu)/<f)iu) — E 1 <?(«) du, 

J—w 1 *—0 

so that 


f 1 giu)/<piu) — E 4^’ du ^ 2tZ) 

J—T I 

g e-^ rlgMMu) - Ecr'e"" * du, 

J-r it-0 



BEPliESENTATIONS OF DISTBIBUTIONS 


387 


since g 1 <fi(u) 1^1. Thus we can make D arbitrarily small if and only if 
we can make 


(23) 


D* = ( 2 t)‘'£ gMMu) - Ecre’‘“ 


y 

E 

i-O 


du 


arbitrarily small. Now the Fourier coefficients of g(u) aref(x); those of l/<p(u) 
are e^(—X)Va:! for a; ^ 0, 0 for x < 0; by the convolution theorem for Fourier 
coefficients [31, p. 90] the nth Fourier coefficient of g{u)/<p{u) is 

(24) Z /(n - k)e\- \f/k\, n = 0, 1, 2, • • • , 

A-O 


and zero for n < 0. Furthermore, it is well known from the theory of Fourier 
series that D* is a minimum if are chosen as the first N + 1 Fourier coeffi¬ 
cients of g{u)/ip{v), and that this minimum is arbitrarily small for large enough N 
if and only if the Fourier coefficients of giu)/<p(u) are zero for negative indices 
—^which is in fact the case. If we then take the values (24) for cl^\ k = 0,1, ■ • ■ , 
N, and express in terms of by (21), Ave arrive at the formula (14). 

It Avill be observed that the minimum D is connected with the minimum D* by 

min D ^ mux I <p(u) [ • min D* g min D* g — — min D, 

mm I if{u) \ 


so that all that Ave can say about the approximation given by (14) Avith a small N 
is that it is an upper bound for the best possible mean-square approximation by 
sums (18), and that the best mean-square approximation is at Avorst times 
it. This means that if D* is small, so is D; but D* is not necessarily small even if 
D is. Hence aac cannot in general expect the coefficients (14) to be suitable for 
practical curve-fitting, since they may increase the mean-square error by a 
factor of as much as e^; aa'c may, hoAAever, expect (14) to be better when X 
is small. 

Now, as AA'e have already observed, 

0 

Six) - Z Ofc''^A^^(.T) 

k-0 

is the xth Fourier coefficient of 

g{u) - <piu) Z (1 - e")*; 

A—0 


if we write (18) in the form 

(25) Six) - Zo^^>A‘0(x) = rgiOMO - Z - e*‘)''L«) dl, 

and choose the as specified above, the expression in square brackets is 
git)/<pit) minus the first JV -f 1 terms of its Fourier series, and so the Fourier 



388 


n. P. BOAS, JB. 


series of [• • •] involves no with k < N + 1. Since the Fourier series of (p{t) 
involves no with k < 0, the product ^(0[’”1 also involves no e'*‘ with 
< iV" + 1, and therefore the integral in (25) is zero for x = 0, 1, 2, • • • , iV 
(since it represents the xth Fourier coefiicient of ^(<)[‘ • •]). In other Avords, 

S 

fix) - E A*0(x) =0, X = 0, 1, 2, • • •, AT 

it—0 

Furthermore, we can compute/(x) — Ei^ a**^’A*ff(x) forx> N by the convolu¬ 
tion formula from the Fourier series of <fiit) and [•••]; for n > N, the nth Fourier 
coefficient of [• • •] is just that of git)/<pit), given by (24), and that of <pit) is 
e \’'/n\, so ioTX > N 

fix) - E a‘0(x) = E fZ/a - \)'/k\) eix - D 

*-0 V-0 / 

and in particular 

N N+1 

m -f- 1) - E ai^^A’^diN + 1) = E /(AT -f 1 - k)i- X)Vfc!. 

k"»0 


9. Proofs: practical problem. We have so far obtained only an estimate for 
the minimum of D, by obtaining the minimum of D* ; this estimate is satisfactory 
for large N and so for theoretical purposes. However, to obtain precisely the 
best mean-square approximation to fix) by a small number N of terms of the 
sum in (18), we have to choose so that 

Eara - ey<pit) 

ib-O 

is the first N + I terms of the expansion of git) in terms of the set of functions 
obtained by replacing (1 - e*‘)V(0, fc = 0, 1, 2, • • • , by an equivalent ortho¬ 
normal set. The process for obtaining this orthonormal set is well known; it 
turns out that the integrals that have to be evaluated are e.xpressible in terms of 
Bessel functions of imaginary argument; the result is that the first orthonormal 
functions are 

Ut) = (2ir)-*a-* exp (Xe’‘), 


Ut) = (2:r)-‘ 


Oil — ape** 

[a«(a? — a?)J 


} exp (Xe‘‘), 


Ut) = (2t)-*^ 


— 00 0-2 — ai(ao — a2)e*‘ — (oi — «o)c**‘ 
[(«! — ao)(ao — aj)(2ai — a* — aoao)]* 


exp (Xe“), 


where oio “ /o(2tX), ai = —iJi(2tX), = —Jii2i\). It is then a simple matter, 
first to express , h in terms of U), ^(0(1 - e’*), ^(<)(1 - e**f, and then 
to determine 0^®^ oi*'; and oi®, o{®, o|®. For example, the best two-term 



REPRESENTATIONS OP DISTRIBUTIONS 


389 


approximation for g{u) in terms of ^o(m), ^i(«) is 

g{u) j j7(M)i^o(M) dti + ^i(u) J gMfiOO du, 


and the integrals J g(u)fi(u) du are combinations of terms of the form 

(2,r)-‘ ^(u) du; 

these in turn are Fourier coefficients of g{u)<p{u) and so are expressible, by the 
Parseval formula, as products of the Fourier coefficients of gfOi) (namely, /(n)) 
and of ip{u) (namely, 6{n)). We omit the algebraic work; the results are given in 
formulas (9), (10), (11). 


10. Proofs: continuous case. In the continuous case of our approximation 
problem we assume that [/(a;) |* is integrable on ( — «, «) and look for coeffi¬ 
cients 0*^^ that will minimize 


where 


D 




/(a;) 


-E 

Jk-O 




0(a-) 


dx. 


e(.x) = (27r) 




du, 


A‘0(a;) = (2 t)-* j[\(M)e-”‘(l - e*“)* du. 


Let f(x) be the Fourier transform of g{u); we can regard ${x) as the Fourier 
transform of ^(w), being defined as zero outside (—tt, tt). Then by Parsevars 
theorem for Fourier transforms we have 


27rZ) = f 1 g(t) dt + f 

I < 1 > ir 


git) - ipit) 2 a* 

fc -0 


(A^) 


(1 - e‘‘)‘ 


dl. 


Clearly, then, D cannot be made arbitrarily small unless git) = 0 almost every¬ 
where outside (—p, ir); and if this condition is satisfied, D reduces to the same 
form which it had in the discrete case—see (19). Thus the problem of mean-square 
approximation in the continuous case reduces, .if it can be solved at all, to the 
corresponding problem in the discrete case. 


11. Representation by a series. We consider the representation of a given 
Six) by the B-series with the classical coefficients (5), but with mean-square 
convergence of the series. Here we assume that fix) ^ 0, fix) = 0 for a; = 
— 1, —2, • • • , and [fix)\^ < ask whether we can have 

|2 


(26) 


lim 2 


fix) - l^OtA*’0(a:) 

k^O 


= 0 , 



390 


Rt P* BOAS} (JR* 


■where here the a* do not depend on n (but are not, in principle, required to have 
the form (5)). From our previous discussion this is equivalent to 

lim f I git) - <pit) 23 ajb(l — e*')* dl = 0, 

n-*oo J—r I A;—0 

and this implies that 

lim 1 a„ 1* r I ^(t) r 11 - dt = 0. 

n'~*oo 

From this it follows easily that 

Z a«(l - e'T 

n-0 

converges for 1 < | < jt, or in other words that 

H(z) = Z a„(l - 2)" 

n-"0 

converges on | 2 1 = 1 except perhaps for 2 = —1, and hence converges in 
I 1 — 2 I < 2. By analytic continuation it is easy to identify H(z) with F(z)^(z)y 
where for | 2 | < 1, 

F(z) = t fin)z ’', Hz) = Z ein)z’' = . 

n*0 n—0 

Since has no singular points, F{z) is analytic in 1 1 — 2 j < 2 and hence in 

particular in 0 ^ a; < 3; since F{z) is a power series with nonnegative coefficients, 
it has a singular point at the positive real point on its circle of convergence 
[30, p. 214], and so it must be analytic at least in ] 2 | <3. This gives the restric¬ 
tion lim supn-00 l/(n) ^ Nevertheless, as we know, f{x) is represented 

in mean-square by a sequence of sums of terms even if we assume 

only that 2 1 f(n) converges. 

In the continuous case, if f(x) ^ 0 and we have 

Z oo I n 2 

/(x) — 23 aiiA*‘0(.v) I dx = 0, 

we must have g(x) = 0 almost everjavhere outside (—ir, jr) and then, as we saw 
previously, (26) holds also. Now since/(x) ^ 0, ^(0 has derivatives of all orders 
if it has derivatives of all orders at i = 0 [29, p. 90] and it is easily seen from 
this that g(i) is analytic for all real < if it is analytic at < = 0. Now on the one 
hand, unless f(z) s 0, g(i) cannot be analytic for all real < if (as we are supposing) 
g(i) vanishes outside (— tt, it). On the other hand, H(e’‘) = p(0/#>(0 for real 
values of i close to 0 and so, if i is regarded as a complex variable, for complex 
values of i near 0. Since l/<p(t) is analytic everywhere, g(t) is analytic at < = 0. 
From this contradiction we infer that a nonnegative/(x) can never be represented 
in the form (27), although it may perfectly well be represented by 

lim f |/(x) -- 23 or’A*fl(x) dx = 0. 

*-00 I ife—O 



REPRESENTATIONS OF DISTRIBI^TIONS 


391 


REFERENCES 

I. Charlier series 

[1] A. C. Aitken, Statistical Mathematics^ Oliver and Boyd, London, 1939, pp. 58-59, 

66-67, 76-79. 

[la] W. Anderssox, “Short notes on Charlier’smethod for expansion of frequency functions 
in series,” Skavdinavisk Akiuar. iids.y Vol. 27 (1944), pp. 16-31. 

[2] L. A. Aroian, “The type B Gram-Charlicr series,” Annals of Math. Stat.y Vol. 8 

(1937), pp. 183-192. 

[3] R. P. Boas, Jii., “The Charlier B-series,” Aw. Soc. Trans, (toappear). 

[4] C. V. L. Charlier, “t)ber das Fehlergesetz,” Ark. Mat. Astr. Fys., Vol. 2, no. 8 (1905). 

[5] C. V. L. Charlier, “Die zweitc Form des Fehlergesetzes,” ibid.y no. 15 (1905). 

[6] C. V. L. Charlier, “Ober die Darstelluiig willkiirlicher Funktionen,” ibid.y no. 20 

(1905). 

[7] C. V. L. Charlier, “Researches into the theory of probability,” Lund Universitets 

Arsskrift N. F., Afd. 2, Vol. 1, no. 5 (1906). 

[8] H. Cramer, Random Variables and Probability Distributionsy Cambridge Tracts in 

Mathematics and Mathematical Physics, no. 36, Cambridge Lhiivcrsity Press, 
1937. 

[9] W. P. Elderton, Frequency Curves and CorrelalioUy 3d ed., Cambridge University 

Press, 1938, pp. 131-132. 

[10] A. Fisher, .^In Elementary Treatise on Frequency Curves and their Application in the 

Analysis of Death Curves and Life TableSy Macmillan, Xew York, 1922. 

[11] A. Fisher, The Mathematical Theory of Probabilities and Its Application to Frequency 

Curves and Statistical MethodSy Vol. 1, 2d ed., Macmillan, New York, 1922, pp. 
271-279. 

[12] M. Jacob, “Uberdie Charlier’scheB-Reihe,” SAaw/iTMzmfc Akiuar. tids.y Vol.15 (1932), 

pp. 286-291. 

[13] C. Jordan, “Sur la probabilite des epreuves rdpetees, le th^oreme de Bernoulli et son 

inversion,” Bull. Soc. Math. FrancCy Vol. 54 (1926), pp. 101-137. 

[14] N. R. J0RGENSEN, “Note sur la fonctiou de repartition de t 3 'pe B de M. Charlier,” 

Ark. Mat. Astr. Fys.y Vol. 10, no. 15 (1915). 

[15] N. R. J0RGENSEN, “Unders0gelser over Frequensfladcr og Korrelation,” Copenhagen 

thesis y 1916. 

[16] M. G. Kendall, The Advanced Theory of Statistics, Vol. 1, 3d ed., Griffin, London, 

1947, pp. 154-156. 

[17] S. Kullback, “On the Charlier type B series,” Annals of Math. Stat., Vol. IS (1947), 

pp. 574-581; Vol. 19 (1948), p. 127. 

[18] R. von Mises, Vorlesungen aus dem Gebiete der angewandien Mathematiky Vol. 1, Wahr- 

scheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen 
Physiky Deuticke, Leipzig and Vienna, 1931, pp. 265-269. 

[19] N. Obrechkoff, Sur la lot de Poisson, la serie de Charlier et les Equations lineaires aux 

differences finies de premier or dr e a coefficients constants. Actual. Sci. Ind., no. 
740, Hermann et Cie., Paris (1938), pp. 35-64. 

[20] H. Pollaczek-Geiringer, “Die Charlier^sche Entwicklung willkurlicher Vertei- 

lungen,” Skandinavisk Aktuar. lids., Vol. 11 (1928), pp. 98-111. 

[21] H. Pollaczek-Geiringer, “Uber die Poissonsche V'erteilung und die Entwicklung 

willkurlicher Verteilungen,” Zeits. f. Angew. Math, und Mech., Vol. 8 (1928), 
pp. 292-309. 

[22] H. L. Rietz, Mathematical Statistics, Carus Mathematical Monographs, no. 3, Chicago, 

1927, pp. 60-68,170-177. 

[23] E. Schmidt, Vber die Charliersche Entwicklung einer ^^arithmetischen Verteilung'^ nach 

den sukzessiven Differenzen der Poissonschen asymptoiischen Darstellungsfunktion 



392 


R. P. BOAS, JR. 


fiir die Wahrscheinlichkeit sellener Ereignisee, Sitzungsberichte der Preussischcn 
Akademie der Wissenschaftcn, Phys.-Math. Kl. 1928, p. 148. 

[24] E. Schmidt, *‘Uber die Charlier-Jordansche Entwicklung einer willkurlichen Funktion 

nach der Poissonschen Funktion und ihren Ableitung,” Zeits, f. Angew. Math, 
und Mech., Vol. 14 (1933), pp. 139-142. 

[25] H. L. Selberg, “tlber die Darstellung willkiirlicher Funktionen durch Charlierscbe 

Differenzreihen,** Skandinavisk Akluar. iids.j Vol. 25 (1942), pp. 228-246. 

[26] H. L. Selberg, “t)ber die Darstellung der Dichtefunktion einer Verteilung durch eine 

Charlierscbe B-Reihe,” Archivfor Mathemaiik eg Naturvidenakabf Vol. 46 (1943), 
pp. 127-138. 

[27] J. F. Steffensen, Some Recent Researches in the Theory of Statistics and Actuarial 

Science^ Cambridge University Press, 1930, pp. 35-48. 

[27a] J. F. Steffensen, ‘‘Note sur la fonction de type B de M. Charlier,*' Svenska Aktuar. 
Tids., Vol. 3 (1916), pp. 226-228. 

[28] J. V. Uspensky, “On Charles Jordan’s series for probability,” Annals of Math. Series 

2, Vol. 32 (1931), pp. 306-312. 

II. Other references 

[29] H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[30] E. C. Titchmarsh, The Theory of Functions, Oxford, 1932. 

[31] A. Zyomund, Trigonometrical Series, Warszawa-Lw6w, 1935. 

[32] Table of the Bessel Functions Jq(z) and Ji(z) for Complex Arguments, prepared by the 

Mathematical Tables Project, National Bureau of Standards, 2d ed., Columbia 
University Press, New York, 1947. 



HEURISTIC APPROACH TO THE KOLMOGOROV-SMIRNOV 

THEOREMS^ 

By J. L. Doob 

University of Illinois 

!• Introduction and summary. Asymptotic theorems on the difference i^etween 
the (empirical) distribution function calculated from a sample and the true 
distribution function governing the sampling process are well known. Simple 
proofs of an elementary nature have been obtained for the basic theorems of 
Komogorov^ and Smirnov^ by Feller,* but even these proofs conceal to some 
extent, in their emphasis on elementary methodology, the naturalness of the 
results (qualitatively at least), and their mutual relations. Feller suggested that 
the author publish his own approach (which had also been used by Kac), which 
does not have these disadvantages, although rather deep analysis would be 
necessary for its rigorous justification. The approach is therefore presented (at 
one critical point) as heuristic reasoning which leads to results in investigations of 
this kind, even though the easiest proofs may use entirely different methods. 

No calculations are required to obtain the qualitative results, that is the 
existence of limiting distributions for large samples of various measures of the 
discrepancy between empirical and tnie distribution functions. The numerical 
evaluation of these limiting distributions requires certain results concerning the 
Brownian movement stochastic process and its relation to other Gaussian 
processes which will be derived in the Appendix. 

2. The problem. Let .Ti , .T 2 , • • • be mutually independent random variables 
with a common distribution function F(X), 

F{\) = Pr[xj < \}. 

In statistical language Xi, • • • , Xn form a sample of n drawn from the distribu¬ 
tion with distribution function F(X). Let Vn(X) be the number of these Xy’s which 
are < X. According to the strong law of large numbers, for each X 

(2.1) lim = F{\) 

n-*«o n 

with probability 1. For fixed n Vn(X)/n is itself a distribution function (which 
depends on the sample values Xi, • • • , Xn) the empirical distribution function, 
and an elaboration of the argument which led to (2.1) shows that (2.1) is true 

1 Research connected with a probability project at Cornell University under an ONR 
contract. 

* Inat, Ital. Aiii., Oiom., Vol. 4 (1933), pp. 83-91. 

* Rec, Math, (Maiematiceskii Sbornik), N.S. 6, Vol. 48 (1939), pp. 3-26, Bull. Math. Univ. 
Moscou, Vol. 2 (1939), fasc. 2. 

* Annals of Math. Stat., Vol. 19 (1948), pp. 177-189. 

393 



394 


J. h. DOOB 


uniformly in X, with probability 1; that is if 


( 2 . 2 ) 




L.U.B. 

—«o<X<oo 


- f (X) 
n 


> 


then Z)« is a random variable and 


lim Z), = 0 

n-*oo 


with probability This result would be of limited practical statistical importance 
except that the distribution of Dn does not depend on the distribution function 
FQi) if F(X) is continuous. In fact in that case the random variables F{xi), 
F{xt), • • • are mutually independent and each is uniformly distributed in the 
interval (0,1); if <'„(X) is the number of F(a:y)’s < X, for j < n, 


L.U.B. 



L.U.B. - Fix) 

“«<X<» u 


Thus it is no restriction, replacing xj by F{xj) if necessary, in finding the distri' 
bution of Dn to assume that F(X) = X for 0 < X < 1, and 


(2.20 


Dn = L.U.B. 

0£X:S1 


l'n(X) 


n 


- X 


The results will hold for Z)„ defined by (2.2) for any continuous F(X). We shall 
also consider Dt and Dn, defined by 


(2.3) 


Dt = 


D- = 


L.U.B. r 

fnjX) _ 

OSXfil L 

n 

-G.L.B. 


osxsi 

L n 



and again the results will hold (with the obvious definitions of Dt and D» in the 
general case) for every continous Fix). 

The problem is to find the limiting distributions of (properly normalized) 
D„,Dt, DZ when « 


3. Derivation of the Kolmogorov and Smirnov theorems. Define 

Xnit) = 0<t<l. 

Since i'«(0) = 0 with probability 1 and Vnit) — vnis) is the number of suc¬ 
cesses in n independent trials, with probability t — s of success in each trial, 
>'«(0 — Vnia) has expectation nit — s) and variance nit — s) (1 — (< — s)]. Hence 

E{xnit)] - 0, 0 < t < 1; 

F{[a;,(0 - xnis)?} = (t - «) H - (t - «)], 0 < « < t < 1. 

* Cf. M. Fr^chet, GintraliUs sur lea probabiliUa. Variables aUatoires, Paris, 1937, pp. 
260-261. 



KOIiMOGOROV-SMIRNOV THEOREMS 


395 


Now let {«(<)} be a one parameter family of random variables, 0 < t < 1 
with the following properties: 

(a) for each j ifO</i< ••• <<><l thei-variate distribution of the random 
variables x(<i), • • • , x(<y) is Gaussian; 

(b) (3.1) holds, that is 

(3.1') E[x{t)\ =0, 0 < t < 1; 

£?{[x(t) — x(8)]*} = (< — «) [1 — (< — s)], 0 < s < < < 1. 

(c) Pr{x(0) = 0} = 1. 

According to the central limit theorem, the j variate distribution of 
®n(<i), • • •, Xn{ti) is asymptotically that of x(<i), • • • , x(fy); in fact the normalizing 
factor n* in the definition of x„(0 and the choice of means and variances in (3.1') 
were made precisely to bring this about. As far as first and second moments are 
concerned the x»(0 and x(t) processes are identical; when n —>■ <» the distribu¬ 
tions, or at least the j variate ones mentioned, become identical also. 

We shall assume, imtil a contradiction frustrates our devotion to heuristic 
reasoning, that in calculating asymptotic Xn(t) process distributions when n —> <» 
we may simply replace the x„(0 processes by the x{t) process. It is clear that this 
cannot be done in all possible situations, but let the reader who has never used 
this sort of reasoning exhibit the first counter example. 

The x(t) process has continuous sample functions (cf. Appendix). Define 

D =s Max 1 x(0 1, 

D'*' = Max x(<), 

D~ = —Min x(0. 

Then in accordance with our substitution principle , n^Dt, n^DZ have as rt 
becomes infinite the distributions of D, D^, D ' respectively. (The latter two 
are the same because the —x{t) process is stochastically identical with the x(t) 
process.) Thus these simple qualitative considerations have led to the existence 
of the limiting distributions derived and evaluated by Kolmogorov, who proved: 
Theorem® (Kolmogorov). 

(3.2) lim > X} = 2 E (-1)’"^* 

n-*80 1 

(3.3) lim Pr{n‘D+ > X} = lim Pr{nD; > Xl = 

«—00 n-*oo 

To complete our treatment we shall prove in the Appendix that 

(3.2') Pr[D > X} = 2 E (-)’"^*e~®’"’^’; 

1 

•In Feller's paper (Zoc. cit., p. 178, equation (1.4)) the factor 2 in the exponent was 
omitted by the printer. The same misprint occurs in Smirnov's table of the values of the 
series in our (3.2), Annals of Math, Stat.f Vol. 19 (1948), pp. 279-281. 



396 


J. L. DOOB 


(3.30 > X} = Pr{D~ > X) = e"®'*, 


80 that in fact the above considerations have led not only to the existence but 
to the evaluation of the asymptotic distributions. (Actually we shall prove 
somewhat more general results about the x(t) process.) 

So much for the Kolmogorov theorems. Smirnov obtained results (also 
independent of the given continuous distribution function F(X)) of a somewhat 
different nature. Let xt, x* , • • • be mutually independent random variables 
with the same individual distributions as the x/b, that is each distributed 
uniformly in the interval (0, 1); define vJ(X) as the number of the first n x/s 
which are < X. Smirnov considered the difference between empirical distribution 
functions, 


Dmn 


][J ^m(X) ^ J^n(X) 

o^x^i m n 


as well as Dmn and Dmn defined in the obvious way. To avoid stressing the 
obvious we consider only the Dmn . 

‘Yth 

Theorem (Smirnov). If n oo in such a way that — —» r, and if 

n 

N * mnUm + n), 


(3.4) 


lim Pr[mDn>r. > X} = 2 i: (-1)"+’ 


n-»oo 


1 


-2m»X* 

e 


To derive this result define an x^{t) process stochastically identical with the 
x(0 process but independent of it. Then if xt{t) is defined by 

= - t), 

we identify, in accordance with our heuristic principle the process with variables 

{x{t) — r*x’*‘(0} 

with the one with variables 

|a:m(<) - x!(<)|- 

Doing this leads to the fact that the distribution of 

\m + nj \n/ 



Max |®(<) — (ry'^x*(t) |. 

0;£(;S1 


converges to that of 



KOLMOGOROV-SMIRNOV THEOREMS 


397 


Now the *(0 process and the process with variables 

lx{t) - 

I (1 + J 

are stochastically identical. Hence we are led to the conclusion that the distri¬ 
bution of mn converges to that of D, and this is Smirnov’s theorem, 

stated above. (The method we use does not seem applicable to Smirnov’s deeper 
theorems on the number of intersections between empirical and true distribution 
curves or between pairs of empirical distribution curves.) 

APPENDIX 

4. The Brownian movement process. Consider any Gaussian stochastic 
process, with random variables {x(0) where t varies in some interval. That 
is, we assume that for each i in the interval x(t) is a random variable and that 
for any j > 1 if < • • • < are in the interval thej variate distribution of 
x(ti), • • • , x(tj) is Gaussian. In the following we shall always assume that 
Ejxfi)} = 0. Then the process is determined stochastically by the covariance 
function 

r(s, t) = £?{x(s)x(0). 

In particular, if the range of parameter is the interval [0, «) and if 

r(s, i) — a Min (s, t), 0 < s,t < «, 

the process is called the Brownian movement process, or sometimes the Wiener 
process; <r is a positive constant. When considering this process we shall write 
f(0 instead of x{t). For the f(<) process 

Prim = 0 ) = 1 , 

Eiim - f(s)f} = a* I ( - 8 I, 

and if 0 < 8i < (i < S 2 < <2 the increments x(ti) — x(si) and x(< 2 ) — x(si) are 
mutually independent. We shall use the following properties of this process, of 
which the first two are well known. 

(a) The sample functions are everywhere continuous with probability 1. In 
the following we can therefore \vrite as if all the sample curves were continuous. 

(b) For fixed s 

(4.1) Pr{ Max [f(s + i) - f(8)] > X) = 2Pr{r(s + T) - (-(s) > X}." 

o<»sr 

(Note that the use of a general initial value 8, rather than 0, has not added 
to the generality and we drop this affectation below.) 

(c) If o > 0, 6 > 0, a > 0, j3 > 0, then 

(4.2) Fr{ L.U.B. [f(<) - (at + 6)] > 0} = 

0 ^ <<00 

’ Due to Bachelier; cf. the proof by P. L4vy, Comp. Math., Vol. 7 (1939), p. 293. One way 
to prove (a) is to prove (4.1) first, with L.U.B. instead of Max, and then use it to calculate 
the probabilities relevant to (a). 



J. L. DOOB 


(4.3) Pr{L.U.B. [{•«) - (<rf + 6)] > 0 or G.L.B. [f(<) + at + 0\ < 0}, 

0^ l<« 

00 

_ I 2 lm*a6+(iii^l) 1) (a0+ab) 1 

m"l 

- l)*ab + m^afi + m(m - l)(afl + a6)l 

-2[m*(a5 + a/!) + m<m - l)afl + m(m + l)a&] 

— C 

2(m»(a6 + o^) + m(m + l)a^ + m(m — l)a6| j. 


in particular (a =« a, jS ^ b) 

(4.30 PtIl.U.B. > ij = 2 Z (_i)"+‘e"*"‘’'‘*’ * 

0 ^ t<oo at + 0 J 1 

The probability in (4.2) is the probability that a f (0 sample curve will ever 
reach the line with slope a and ordinate intercept b; the probability in (4.3) is 
the probability that a sample curve will ever reach either of the indicated 
halflines, one above and one below the t axis. Since the right hand sides are 
continuous functions of a, 6 , a, we could write >0 instead of >0 and <0 
instead of <0 on the left, so that these probabilities are also the probabilities 
that a sample curve will ever rise above the indicated line or leave the indicated 
angle. 

It will be convenient to describe a line by its slope and ordinate intercept; 
the line [Uy v] is the line with slope u and ordinate intercept v. We shall take 
<r = 1 in the proof; this is no essential restriction since f(0A is the random 
variable of a process of the same type whose cr is 1 . 

To prove (4.2) let tpia, b) be the probability on the left, the probability that a 
sample curve will reach the line [a, b]. If b = bi + 62 , > 0 , a sample curve 

which is to reach [a, b] must first reach [a, bi] and then move up to meet a line 
with slope a, bo units above the first meeting with [a, 61 ]. Then 

<p(a, bi + 62 ) =» <p(a, bi) <p{ay 62 ). 

Now (p{a, b) > Pr{f (1) > a + 6 } >0 and<^(a, 6 ) is monotone non-increasing in 6 , 
for fixed a. The only solution of the functional equation with these properties is 

<p{a, b) = 


Now ^(a, b) is the probability of reaching [0, 6 ] at some first time s and then 
going on to the line [a, 6 ] which from the vantage point of the first common point 
(s, f (s)) is the line [a, as]. In other words, using (4.1) 


-tMi, = _ r Pj.{ Max f (0 > b] 

-r 




b^-mi2. 

(2ir)‘«8*« 



KOLMOGOROV-SMIRNOV THEOREMS 


399 


-?j( 

_ -6(2«*(a))»/» 

— ® 1 


h^aiia)-] 

2s» J 


ds 


from which it follows that ^(o) = 2a, and this yields (4.2). 

To prove (4.3) we consider first the following general problem: Let [mi , Vi], 
[ut , I'z], • • • , Uj > 0, Vj > 0 be a sequence of lines; let < = <i be the first value of /, 
if any, at which a sample curve meets [mi , dj]; if U is defined for a sample curve 
let <2 be the first value of < > , if any at which the curve meets [—W 2 , — 1 ; 2 ]; if 

<2 is defined for a sample curve, let <3 be the first value of < > fe, if any, at which 
the curve meets [?< 3 , t's], and so on. Let x„ be the probability that there is a point 
tn , in other words the probability that a sample curve meets the lines [t<i, rj, 
[—M 2 , — *^ 2 ]) • • • [( —l)"^*w„ , (— 1 )’‘'''V„] in at least n successive points. We write 

ifn = ir„(wi, Vi, • • • , M„ , y„). 


In particular, according to (4.2) 

(4.4) xi(mi , vi) = e“*”**'. 


To evaluate , let Q be the point (t„_i, f (<n-i)) on the sample curv'e, and 
suppose for definiteness that n is even. Starting at Q, if there is a tn , the curve 
must finally reach [—w„ , —Vn], that is it must go to a line of slope —Mn , which is 
Un-itn-i + I’n-i + Wntn -1 + I’n uoits Vertically below its initial position Q when 
t — tn-i . According to (4.2) the probability of doing this is 


Now we replace the line [—Wn , — Vn] by a line which depends on <n-i but which 
leaves this probability unchanged; the ncAv line has slope — (un-i + «n) and is 



_Wn_ 

1 “I” Un 


(Mn—l tn—1 “I" 1 "f“ Wn tn—1 


+ I’n) 


units below Q when t = tn-i • Finally we reflect this new line in the line parallel 
to the t axis through Q. These two changes do not affect the probability we are 
discussing because the changes of f (t) after are independent of the changes 
before and have symmetric distributions. The final line has slope Wn-i + Vn~i 
and is h units above Q when i = <n-i ; it is the line 

Un-lt>n-l + Unt’n + 2Mnt>n-l 1 

Un-l + Mn J 

which does not depend on <n-i. This line lies above [m„_i , t)n-i] in the first quad¬ 
rant, so that if a sample curve reaches it the curve must also intersect [m«-i , Vn-i]. 
We have thus proved that 


^Mn—1 “j“ Mn , 


(4.5) ir„(Mi, ; • • • ; Mn, Vn) 

... , Mn-iy«-l + Mnt’n + 2u„Vn-l\ 

Ml J ^1 J * * * , Mn—2 , t^n—2 , Mn—l “1“ Mn , . I. 

Mn-l + Mn / 



400 


J. L. DOOB 


The fundamental identity (4.5) makes it possible to reduce the evaluation of 
ir» to iri in » — 1 steps; tti is evaluated in (4.4). Thus successive meetings with n 
lines have been reduced to a meeting with a single line. As a first example suppose 

«1 =...=«, = M, ffy = ... = Vn = 

Then we have 

ir„(M, v; ••• ;u,v) = 7r»_i(M, d; • • • ; 2m, 2i;) = • • • 

= Ti(«M, nv), 

so that 

(4.6) xn(M, v; •' ’ ; u, v) = e 

More generally suppose 

Ui = uz =••■= a, i>i = *)»=•••= 6, 

Ui = Ui — ••• — a, Vi = Vi =•"= p. 

Then we show that for suitably chosen C#^’s we have according as n is even 
or odd 

i', ',01,0) = Vi 


n. . . CTob + CT«/3 + CroiS + CrU"* 

(a + a), --- 

^ |(a+a) 


(4.7) 


irn(a, 6; • • • ;a,b) 




n + 1 , n - 1 CTab + + Ci’‘’ab"' 

2 2 n+ 1 , »+ 1« 


For n = 1 this form is correct with 

= 1, = 0. 

If now n is even and if the equations are true for w, 

/, ,N / "k ^ f ^ \ "1“ ab -f- C^^cib + 

7rn+i(a, 6; • • •; a, 6) = 7r2 / a, 6; - (a + a),- — - - —?-!- - —^ 


_ /?i + 2 u64’ -f- C^^ob 4" Cz^^ab 4- + w(a 4“ o)b 


^ 1 
- a 4- a 




= TTi 


2 


n + 2 . n 

—■— a 4“~ a 
2 ^2 


) 

■)' 


and comparing this with (4.7) we find that 

(7<»+« = + n + 1, 



KOLMOQOBOV-SMIRNOV THEOREMS 


401 


^(»+l) ^ Qin) 

Cir*^ - Ci"’ + n, 


If n is odd we find similarly that 

- Ci"’ + n, 



^ Qin) 



-f n -h 1. 

The solution of these equations is 



n even 

n odd 


II 

/-»(») _ («^ + 0* 
4 


pin) __ W 

c, -- 

/nr(n) l) 

4 


^(») _ n(n - 2) 

4 

^{») _ n* — 1 

4 


^(n) w(n + 2) 

■ 4 

4 

Then 



(4.8) 

—itn*<ii + niaP + i»(n — 2)o/S + n<n + 2)akl 

2r» = e 

-it(n + l)»a6 + (n - + ( 

TTn — 6 

- l)o|3 -f (n2 - DatJ 


(n even). 


(n even), 
(n odd). 


We can now prove (4.3). In fact the left side is equal to 

iri(a, b) + Ti(a, |8) — vt(a, 6; a, — Vi{a, (i;a,b) + • • • , 

which gives (4.3), on substituting (4.8). Only (4.3'), which follows from the 
simple (4.6), is used in the application to the Kolmogorov-Smimov theorems. 


6. Transformations of Gaussian processes to the Brovmian movement process. 
The f (0 process studied in section 4 is so simple that it is important to be able to 
reduce others to it by elementary changes of variable. For e.xample if the co- 
variance fimction of a Gaussian process has the form 

(5.1) r(«, t) = uis)v{t), 

for s, < in some interval, and if the ratio 

u{t) 


s < t, 



402 


J. L. DOOB 


is continuous and monotone increasing, with inverse function Oi(0. We define 

■ 

With this definition the f process is Gaussian and since if s < t 

the ^ process is the Brownian movement process with <r = 1. This transformation 
from the x to the f process is effected by a combination of a change of variable 
in t and the application of a variable scaling factor. (Conversely, if such a trans* 
formation is applied to the Brownian movement process it is trivial to verify 
that the new covariance function will have the form (5.1). The Gaussian processes 
with covariance functions of this form are easily seen to be the Gaussian Markov 
processes.) 

6. The Gaussian process with r(s, t) = s(l — t). In sections the Kolmogorov- 
Smimov theorems were reduced to properties of a Gaussian process with param¬ 
eter t, 0 < t < 1, for which 

Pr{x(0) = 0} = 1; 

^^{*(0} >= 0; 

E{[x{t) — x(s)f} = (< — s)Il — {t — a)], 0 < 8 < t < 1. 

Now these equations imply that 

P{x(0*} = t(l - 0, £?{x(s)*} = 8(1 - 8), 

and combining the set we find that 

r(s, 0 = P{x(8)x(0} = s(l — 0> 0 < 8 < < < 1. 

This covariance function has the form studied in section 5, and using the trans¬ 
formation of that section 

f (0 = (< + l)x 2 ) ’ 0 < < < 00, 

defines a Brownian movement process (with a = 1). Then if D, D^, D~ are 
defined as in section 3, we have from (4.3') 

Pr{D > X} = Prll. U. B. > x| = £ (_l)"*+>c-***'‘’, 

( 1+1 J 1 

and from (4.2) 

Pr{D+ > X} = Pr\D~ > X} = e"^’. 



KOLMOGOROV-SMIBNOV THEOREMS 


403 


This proves (3.2') and (3.3'). Note that we could go beyond these results, because 
of our detailed knowledge of the z(i) process. For example we can evaluate 


lim Pr{(n)*Z)„ < Xi, (n)*Dt < X 2 }. 


If Xi = Xs = X the probability is the probability that (n)*^*D„ < X which we have 
already treated. In general it is, in the limit. 


Pr{ Min x(i) S — Xi, Max x(t) < X 2 } 


- Prlo 


L.B. M 
t + 


- > 
1 - 


Xi L.U.B. 
os <<• 


r(0 

t+ 1 



^ Ig-2[m>\«+(m-l)>Xj+2m{m-l)X,Xil 
m*»l 


_j_ ^~2[(m-l)*X|+m»X*+2m(m-l)\iX2l 


^-2lm*(Xj+X2)+m(m-l)XiX2+m(m-l-l)XiX2l _ g-2lm*(X*+X*)-Hn(m-Hl)XiX2-fm(m-l)XiK2l 


^ |^--2[mX,f(m-l)Xil* 
m—I 


+ e 


-2[(m-OXs+mX,l» 




obtained by setting a = h = X 2 ,a = /3 = Xiin (4.3). 



PEARSONIAN CORRELATION COEFFICIENTS ASSOCIATED WITH 
LEAST SQUARES THEORY 

By Paul S. Dwyer 

University of Michigan 

1. Introduction and summary. It is well known that the zero-order correlation 
betAveen the predicted value of a variable and the observed value of the variable 
is the multiple correlation. It is also well known that the zero-order correlation 
between the residuals for two different variables, when the prediction is from a 
common set of variables, is the partial correlation. These considerations naturally 
lead to a systematic investigation of all the zero-order correlations involving 
the various variables associated with least squares theory. Such an investigation 
is the purpose of this paper. 

As a result of this study it appears that other zero-order correlations include 
the multiple alienation coefficient, the part correlation coefficient, and certain 
other coefficients which, as far as I am aware, have not been previously defined. 

The paper first examines the case of a single predicted variable and then 
continues with the case in which two or more variables are predicted simultane¬ 
ously. The paper includes (1) a theoretical development of the different coeffi¬ 
cients and the relations between them, (2) the expression of the formulas in 
determinantal form, (3) a matrix presentation of the material, and (4) an outline 
of the calculational techniques—^with illustrations. 

It should be made clear at the start that this paper deals with populations 
(finite or infinite) and not with samples from those populations. The sampling 
distribution of each of the new correlation coefficients defined in this paper 
might well become the subject of a later investigation, but first we need to 
know what these correlation coefficients are. 

2. The case of the single predicted variable. Notation, definitions, and basic 
properties. We suppose that a population consists of N individuals with values 
Xij , Xij , • • • , Xhj , Yj for the variables Xi , Xi , • • - , Xk , Y and that Y is 
linearly predicted from the Xi by the formula 

(1) E = F - ao - aiXi - a*- a^Xk = Y -Y 

by least squares theory. For the purposes of this paper, we use a concise summa- 

1 / 

tion notation, 2Q, in place of the more formal serial notation 2 Q* which is 

b 

preferable to the frequency notation Oa/x and, in the continuous case, 

Sf-Hl 

/ Qj/xd* Moreover it is desirable that the scales of X and F be chosen so as 


404 



PBABSONIAN CORRELATION COEFFICIENT 


406 


to facilitate the easy determination of the various formulas. If we let 


( 2 ) 


Vi 


Yj-Y 

VNffy 


Xii = 


■\/N(Tx( 


we have = 2j/* = 1 with the resulting correlating formula 


(3) 


PxiV — 






Ixiy and = 2x,x,-. 


The transformations (2) when applied to (1) give 
E 

(4) e = - y - (fiixi + fiiXi +•“+ hxk) = y - y 

E 

where the /3’s are standard regression coefficients and e is defined to be . 

It is to be noted that the values of x„ y, e, and y are all dimensionless. 

The values we wish to correlate are those of 7, JT of (1). The zero-order 
correlations involving these are the same as for x<, y, e, y of (4). 


3. Correlations with a single predicted variable. We wish to minimize 
Se*. Differentiating with respect to j9< and equating to zero we get 

(5) Scxi = 0 

from which by multiplication by /3< and summation for i, 


(6) 2ej/ = 0. 

It follows that 

(7) 2e“ = 2e(j/ - y) = 2ej/ = 2(j/ - y)y = 2y® - 7iyy = 1 - 7^yy 

= 1 - Zie + y)y = 1 - 2j/*. 

Using (4) and (7), we get 

iVo’i' ffr 

SO that 


( 8 ) 




2 2 
<Ty "" (Tb 
2 • 
(Ty 


This is the conventional definition (from least squares theory) of the multiple 
correlation coefficient, so 

(9) “ P»(*) ~ Yy = Y,yy. 

Application of (9) to (7) gives 

( 10 ) 2 e *=1 P»(*) = K»(x) Kv:«i<i ••• »* 



406 


PAUL S. DWTEB 


where «,(*) is the multiple alienation coefScient. We now have 2a;? =» 1, 2j/* = 1, 
2e‘ » 4(*)> P*M> ^ present formulas involving 

*«•» y> 8» y* We first form the cross products 

(11) Sxy = p„, 

(12) 2xe = 0, 

(13) 2xy = 2x(y + c) = 2xp = pxy, 

(14) Zye = ly(y - y) = ^y^ - ^yy = I - 2^* = 4(x), 

(15) 2y^ = 2^- = p?u), 

(10) ley = 0. 

We then have 


(17) 

2XiXj 

■ V^?)(2x?) - 

(18) 

IXiy 

■ V(Sx?)(2j/*) “ 

(19) 

Xxe 

~ \/(^*)(2e») “ 

(20) 

Ixy 

V'(2x*j(2^») 


Xxy 


Pxy 

Pviz) 


It is interesting to note that this is unity in case fc = 1 for then pxy = py(x). Other¬ 
wise the absolute value of pxy is larger than that of pxy. For this reason this 
coefficient might be called the mxdti'plc augmented correlation coefficient. 


( 21 ) 


Pel/ 


^ey 


^V(x) 


\/(2c*)(2y*) «„(x) 


—' /tyU) • 


Thus the correlation between y and its residual is the multiple alienation coeffi¬ 
cient. 


( 22 ) 


2ilyy ^ _ 

= vw)W) ^ 


Thus, as is well known, the zero-order correlation between observed and pre¬ 
dicted y is the multiple correlation. 


(23) 


_ _ n 

V(2e*)(2^=) 


4. Notation for the general case. We need to extend the notation and the 
definitions before examining explicit formulas for the more general case of two 
(or more) predicted variables. Suppose that F,- and Fy are the two variables 



PBARSONIAN CORRELATION COEFFICIENT 


407 


predicted from the same X’s. Then from (4) we write 


(24) 


_ E, 

’ y/NoYi 

_^ 

y/NffYj 


2/i ^*2^2 *"* ^ik^k 

-Vi- - ... - ^jkXk 


We then have the two sets of normal equations 


J/. - yx 


Vi - Vh 


(25) 

o 

il 

2cya; = 0 

SO that 



(26) 

2e<^, = 0 

^Ciyj — 0 

2eyy< = 0 

2eyjr/ = 0. 

It follows that 




(27) 


Sc.eji = Sc.(?/,- - yi) = Se<yi = - y,)yj = - 2 ^, 2 /, 


= SJ/.-J/, - 22/<^,- = Sj/.-2/i - S^.^, = Pij - Sy<7^y 
if we use the notation that pn = p„.y,. 


6. The correlations involving more than one predicted variable. In this case 
the y% the c’s and the ^’s (as well as the x's) can have more than one variable 
so tW the correlation coefficients we need, in addition to those of section 3, are 
Pvtvjy Pvilip Pv\tj> P»i«i> Pvivjy PviVit P«i»i> P»»«,’. We need now only the 
summed products 


— Pmyj — Pa, 

(28) 

Se,e,- = Pij - "Zy^yi as given m (27), 

(29) Zyfij = '^yAVi - y,) = = pn - Xy,yj, 

(30) 22/.J/,- = ^ytyj, 

(31) 2e.j/,- = 0. 


We have then 
(32) 


V(2y1)(2y?) 




(33) 


_ 2e, gy _ Pij - 2y<y, - 

“ VileDOe)) K.(.)*i(.) 


This is the partial correlation coefficient. 

^ViVi 


(34) 


PviVi — 


VmDily)) 


PiWPiM 



408 


PAUL S. DWYER 


This coefficient appears to be new. Since it is the correlation of predicted values, 
I suggest that it be called the 'predictions correlation coefficient. 


(35) 

_ pij — Syipj 

P»*«) \/(Iy?j(Ze;) K,(») 

(36) 

Pa - 
■ K.(.) 


The correlations given by (35) and (36) have been defined previously and are 
known as part correlation coefficients [1; 213,497], 


(37) 




PiM 

(38) 

^ytyi 



p«(») 


The correlations of (37) and (38) appear to be new. Each is, in a sense, a generali* 
zation of the multiple correlation coefficient since it becomes the multiple cor¬ 
relation coefficient when i = j. I suggest that it might be called the cross multiple 
correlation coefficient^ since it correlates the actual value of one variable with 
the predicted value of another. 


<39) 


p9iVi 


Pvi^i 




= 0 . 


•A summary of definitions and names of Pearsonian correlation coefficients asso¬ 
ciated with least squares theory is presented in Table I. No name is proposed 
when the coefficient is identically zero. 


6. Relations between the correlations* Many relations exist between the 
correlations defined in earlier sections. Some of the more interesting of these 
are obtained by the elimination of from formulas involving this term.Thus 
from (34), (37), and (38) we get 


— PviViPiOOPK^) PviVjPiix) PviViPiW} 

and from (33), (35), and (36) we get 


We then have 
(40) 


P»J PviVj Pi(x) P;(x) 

Pa PviViPiOt) 

Pa PviVi Pii*) 


P§i$i Kj{x) 


where the six members may be equated in all possible ways. 



PEAHSONIAN COHHELATION COEFFICIENT 


409 


Interesting and simple relations can also be obtained by formation of ratios. 


Thus 

p>i»i _ _L 



(41) 

Pvief 

so 

= 


P*\«i _ ^ 

P*%yj 



P«iVi 




TABLE I 


Definition 

Name 

Single predicted variable 

P**»y 

Correlation coeflScient of zero order 

Pxy 

Correlation coefficient of zero order 

H 

• 

II 

o 

None 

p»lf- 

"'Multiple augmented correlation coefficient 

pym ^ 


Py ^ Ify(jB) 

Multiple alienation coefficient 

II 

Multiple correlation coefficient 

P,v - 0 

None 

Two or more predicted variables 

1 

Pv%Vi 

Correlation coefficient of zero order 

P ei «y 

Partial correlation coefficient 

PviVj 

*Predictions correlation coeflScient 

Pvi •i 

Part correlation coefficient 

PviVj 

*Cross multiple correlation coefficient 

Pei Vi 

None 


* Proposed name 


Similarly 

(42) 


PviVj P*(g) 

PyiVf Ptix) 

The geometric mean of similar coeflScients yields such expressions as 

Pviei P9iVi ~ 's/^i{x) ^i(x) 

y/PyiVi Pvivi ~ PviVi y/piix) Piix) 


(43) 


7. Determinantal formulas* The implicit normal equations (5) become when 
expanded 

Pll^l + Pl2^2 + ' • • + PlkPk == Plir 
(44) f>2101 + P22ff2 -f- • • • -f- p2k0k = p2u 


Pklffl + Pk2fi2 pkkfik = Pky 


410 


PAUL S. DWYEIi 


while = p5(,) becomes 

(45) Pyifii + p^2 + • • • + Pylftk = pj(»). 

Let A be the determinant of the matrix of the solution of the k x’s and y. Let A' 
be the corresponding determinant with pyy replaced by pJo. Let A^ be the 
determinant of the correlation matrix of the k x’s. Then pj(,) = = ^yy can 

be expressed as a function of A and Ayy. If (44) and (45) are to hold simultane¬ 
ously, then A' == 0. Expanding A' in terms of the bottom row, we get 

(40) A' = 0 = pJ(*)Avy -H “terms”. 

Similarly 

(47) A = pyyAyy -f “terms” 

where the ”tecms“ of (40) and (47) are identical. It follows by subtraction that 
A = (1 — pJ(i))A^^ and hence that 

(48) ^yy “ ^y — p»(x) = 1 ~ • 

Then 

(49) 2c* = Icy = K*(., = 1 - 2:^* = 1 - (l - 
Correlation formulas of section 3 then appear as 


(50) 

Pzy 

(51) 

II 

Q. 


(52) Pi/y — 

In a similar way the normal equations (25) become two sets of normal equations. 
The first set is like (44) with 0, replaced by 0^^, and p», replaced by p,.,. The 
second set is similar with i replaced by j. It is desired to find 

(53) = Pyfl0l + PKy2^S + • • • + Pyjtfik- 

Now using (53) with (51) as applied to yt and using the technique of the first 
part of this section, we get 

(54) Ayfyf = PyiyjAy(yf.yjyf H" “temiS”, 

(55) 0 = "^V^iAyiyi^yjyj 4“ terms , 

where A is the determinant of the matrix of the correlations of the k z’s, and 




PEAKSONIAN CORRELATION COEFFICIENT 


411 


Hit is the determinant obtained by deleting the column involving correla¬ 
tions of y, and the row involving correlations of y,; is the determinant 

of the matrix of the k a;’s; and the “terms” in (54) and (55) are identical. It 
follows that 

(56) = p,, - 

^ViVi-ViVi 

and thence 

(57) 

^VtViVjVj 

The formulas of section (5) then appear in determinant form as follows 


as is well known. 


(59) 


(60) 


(61) 


Pl/tVi 






Formulas for pe,yj and py,„^ are similar to (60) and (Gl). 

Modem methods of calculating determinants (2), (3), (4), (5) are advised if 
calculations are to be made from those formulas. 


8. Matrix formulas. A matrix presentation is very useful in exhibiting the 
general features of this theory and in developing compact and easy methods 
of calculation with finite populations. The matrix presentation here is similar 
to that given by the author in a previous article [6]. 

Let the normal equations (24) be represented by the matrix equation. 


( 62 ) 


E^Y-XB^Y-Y. 




412 


PAUL S. WYEB 


Then the sets of normal equations become 

X'E = 0 or X'{Y - XB) = 0 

so that 

(63) X'XB = X'Y. 

Now since XB — Y, (63) can be written as X'Y = X'Y and it can be shown 
that 

(64) Y'Y = Y'Y = Y'Y. 

But under the assumptions of section 2, X'X is the matrix of the intercorrela¬ 
tions of the X's, X'Y is the matrix of the intercorrelations of the x’s and y’a 
and T'K is the matrix of the intercorrelations of the y’s. Hence (63) can be 
written 

(65) B„B = 
so that 

(66) B R7xRx»^ 

If F is composed of a single variable, B is a single column matrix (vector) 
but if F is composed of m variables, B is an m column matrix. It follows at 
once that 

(67) Y'Y = Y'Y = B'X'XB = B'R^B = R'^RZIr^RZIR^ = RXR^ 
and that 

E'E = (F - XB)'E = Y'E = F'(F - XB) = F'F - Y'Y 

( 68 ) , , 

= Y'Y -Y'Y = Ryy- R^yRllR^. 

It thus appears that the matrix (67) has diagonal terms = '2yy which are 
the squares of the multiple correlation coefficients, and that the non-diagonal 
terms are ’Syufj = Syt^y. Similarly the matrix (68) has diagonal terms 2e* = 
icj(,) and non-diagonal terms 'Zefij = Se,-yy. It follows that all the correlation 
coefficients defined above may be calculated from the matrices Rxx, Rxv, Ryy, 
Y'Y, and E'E. The matrix (67) might be called the multiple correlcUi&n matrix 
and the matrix (68) the multiple alienation matrix. 

Conventional results are expressed in terms of the correlation matrices Rxx, 
Rxv, and Ryy. All the correlation coefficients defined in this paper may be ex¬ 
pressed in terms of these matrices and the multiple correlation and alienation 
matrices. 

9. Calculational method of determining the multiple correlation and multiple 
alienation matrices. Various methods might be used in calculating the multiple 
correlation and alienation matrices from the correlation matrices. One method 
utilizes the square root method of solving simultaneous equations, which has 



PKARSONIAN CORRELATION COEFFICIENT 


413 


recently been presented in a number of places, [7J [ 8 ] together witli a device 
which is similar to that used by Aitken [9] in eliminating the back solution. This 
method solves the equation (65) by forming the auxiliary 

(69) S„B = 
where Sxx is a triangular matrix such that 

(70) Rxx - S'xxSxx = 0. 


TABLE II 


General 

Illustration 


Ryy 





1.000 

.495 







— 

1.000 



1.000 

.652 

.554 

.615 

.313 

.650 



— 

1.000 

.747 

.693 

.280 

.803 

J^xx 

Rxy 

— 

— 

1.000 

.774 

.182 

.804 



— 

— 

— 

1.000 

.166 

.812 



1.000 

.652 

.554 

.615 

.313 

.650 




.758 

.509 

.385 


.500 

Sxx 

SxxRxxRxy 



.659 

.360 


.287 






.586 


.199 




j 



.117 

.221 


Y'Y 





— 

.794 







.883 

.274 


E'E 



I 


— 

.206 


The right hand side of (69), when premultiplied by its transpose yields 


(71) {SxxR7xR.„y{SxxR:^R^) = RXs'xxSxxR7.'Rzv = RXRxv = Y'Y. 


Speaking less technically it is only necessary to multiply the columns of 
SxJi7xRxy to get TY. 

A first illustration utilizes the correlations of the Carver anthropometric 
data [10] for 1000 University of Michigan freshmen. This group may be regarded 
as constituting a population, or it may be regarded as a random sample of a 
larger population. For present purposes we regard it as a population. Height 
(Fi) and weight (Yi) are estimated from shoulder girth (Xi) chest girth (Xj), 
waist girth (X|), and right thigh girth (X 4 ). The calculation of Y'Y and E'E 
from the correlation matrices follow. 


















414 


PAUL 8. DWTER 


As a second illustration I use the correlation between the parts of two forms 
of the Thorndike Intelligence Examination which Lorge has used in illustration 
canonical correlation technique [11, 69-74]. The AT’s are the scores on the three 
parts of Form A and the F’s are the scores on the three parts of Form B. In 
this case we designate the results by r’s and k’a (rather than p’s and k’s) since 
the calculation is considered to be for a sample. The calculation of the sample 
multiple correlation and^multiple alienation matrices is presented in Table III. 


TABLE III 



Form A 

Form B 



Xi 

X2 

Xs 

yi 

y2 

yi 






1.0000 

.8235 

.7912 






— 

1.0000 

.8315 

1.0000 

Ryy 


1.0000 

.7830 

.7852 

.8986 

.7841 

.8217 


Rxx 

— 

1.0000 

.8393 

.7961 

.8543 

.8254 

Rxu 


— 

— 

1.0000 

.7083 

.8226 

.8588 



1.0000 

.7830 

.7852 

.8986 

.7841 

.8217 


S„ 


.0220 

.3609 

.1487 

.3864 

.2920 

^xxRxx Rxy 




.5032 

.0180 

.1341 

.2146 






.8299 

.7045 

.7858 






— 

.7821 

.7861 

TY 





— 

— 

.8009 






.1701 

.0590 

.0054 






— 

.2179 

.0454 

E'E 



j 

1 

1 

— 

i 

.1991 



10. The ntunerical values of the coefficients. The diagonal entries of the 
multiple correlation matrix give the values of Sy? = = p®(») while the 

non-diagonal values are ^ytyj = The diagonal entries of the multiple 

alienation matrix are = Ky(x) while the non-diagonal entries are 

'LenBj — ZCij/^ = Zj/jC,-. We are then able to write out any of the correlations 
easily. Thus from Table II 

PK.) = = \/Al7 = .342, 

P2(*) — V^Zyl = V^.794 = .891, 

Ki(.) = y/Ie\ = = .940, 




PEARSONIAN CORRELATION COEFFICIENT 


415 


*i<x) = \/2ei — •s/.206 = .454, 


Pl2(») = 

2 ) 6 i 62 


.274 

— AAQ 

\/(2c?)(2ei) V(.883)(.206) 

= .0^0, 


^yiyi 


.221 

— 79J. 

pyiy2 — 

VC2yl)(Xyl V(.117)(.794) 



2eie2 

.274 

= .604, 


Pyl^2 = 


Vim 



Z 6162 

.274 

= .291, 


Pl/*ei = 

\^! ” 

Vi^ 


PviVk “ 

Syiya _ 
“ 

.221 

V.794 

= .248, 


PviV 2 ~ 

^yiVi 

V^? ” 

.221 

virt 

= .646. 



TABLE IVa 


General ! Illustration 


pi(*) 

^yl 

'^ViVZ 

1 ^viV2 

^VlVl 

^VlVl 1 ^UlVl 

2yit/3 

.9110 

.8299 

.9489 

.8392| .8644 
.7645 

.9603 

.86261 .8747 
.7858 


rnx) 

2^2 

^ y2v3 
^»lVl 1 

2y2^3 


.8844 

.7821 

.9917 

I .8889 |.8751 
.7861 



f3(*) 

2t/^ 



.8983 

.8069 


TABLE IVb 


Greneral 

Illustration 

1 

Tti «2 



.3066 

.0298 

^i(*) 

^ •iy2 1 ^vi «j 

^«iyi 1 ^yi«i 

.4124 

.14311 .1264 

.01311 .0123 

2ei 

Seiej 

Se,e, 

.1701 

.0590 

.0054 

' 


»•«... 



.2214 


^(») 

^•jyi 1 ^y2«i 


.4668 

.09731 .1033 


2e| 

Se2f’3 


.2179 

.0454 



hi*) 



.4394 



2e* 



.1931 






416 


PAUL S. DWYER 


It is possible to utilize a scheme of successive division if all these correlations 
are desired when there are more than two predicted variables. By divisions we 
compute in turn pv(x), py^y^ and py^y^ from the multiple correlation matrix 
and K,(*) peiyp Pyiti, Ptiti from the multiple alienation matrix for each t, y. The 
computational scheme is illustrated in Table IV where the correlations used 
are the sample correlations of Table III. The calculations from the multiple 
correlation matrix are presented in Table IVa and those from the multiple 
alienation matrix in Table IVb. 

In Table IVa the multiple correlation matrix is first entered on the third of 
each three lines. The square root of each diagonal term is then extracted to give 
the multiple correlation coeflScients. The value of r,(*) is then locked in the 
machine as a divisor and it is divided, in turn, into 2 ^ 12 / 2 , 2 ^ 12/3 to get Vy^y^ and 
fyjy,. Then r 2 (*) is used as a divisor by division into Vy^y^ to get into 2^i^2 
to get ryjyjj into to get ty^y^. Finally n^x) is divided into Vy^y^ to get 
into 2 ^i^ 8 to get Vy^y^y into to get ry^y, and into '^y^yz to get ry^y,. A 
check on these divisions can be made, if desired, by dividing Vy^y^ by ri(x) to get 
^i(*) to get ryjy,, and ryjy, by t%{x) to get ry^y,. 

Table IVb is treated in a similar manner. 

This technique is immediately applicable to the case of many predicted 
variables. 


REFERENCES 

[1] M. Ezekiel, Methods of Correlation Analysis^ Second Edition, Wiley, 1942. 

[2] A. C. Aitken, the evaluation of determinants, the formation of their adjugates, 

etc.,” Edinb. Math, Soc. Proc.y Series 2, Vol. 3(1933), pp. 207-219. 

[3] P. S. Dwyer, “The evaluation of determinants,” Psychometrikay Vol. 6(1941), pp. 191- 

204. 

[4] A. C. Aitken, Determinants and Matrices, Second Edition, Oliver and Boyd, Edin¬ 

burgh, 1942. 

[6] F, V. Waugh and P. S. Dwyer, “Compact computation of the inverse of a matrix,” 
AnnaU of Math. Slat., Vol. 16(1946), pp. 369-371. 

[6] P. S. Dwyer, “A matrix presentation of least squares and correlation theory, etc.,” 

Annals of Math. Stat., Vol. 16(1944), pp. 82-89. 

[7] P. S. Dwyer, “The square root method and its use in correlation and regression,” 

Am. Stat. Assn. Jour., Vol. 40(1946), pp. 493-603. 

[8] D. B. Duncan and J. F. Kjjnnby, On the Solution of Normal Equations and Related 

Topics, Edwards Brothers, Ann Arbor, 1946. 

[9] A. C. Aitken, “The evaluation of a certain triple product matrix,” Roy. Soc. of Edinb. 

Proc., Vol. 67(1937), pp. 172-181. 

[10] H. C. Carver, Anthropometric Data, Edwards Brothers, Ann Arbor, 1941. 

[11] Irving Lorge, “The computation of Hotelling canonical correlations,” Proceedings of 

Educational Research Forwtn, Endicott, N. Y., Aug. 26-31,1940, pp. 68-74. 



INVERSION FORMULAS IN NORMAL VARIABLE MAPPING 

By John Riordan 
Bell Telephone Laboratories, New York 

1. Summaiy. The two inversion formulas considered here arise from study of 
G. A. Campbell’s work on the Poisson summation, which is described more fully 
in the introduction and in the main consists of finding a function or mapping of 
a variable connected with the summation in terms of a normal (Gaussian) 
variable g. More generally, this last is a process often called “normalization of 
the variable” and associated with the names of E. A. Cornish and R. A. Fisher. 
The mapping is two-way and the main inversion formula determines co-efficients 
for one way from those for the other, both sets of coefficients being descriptive 
of their mappings. More precisely if x is a given variable, g a Gaussian variable, 
y a parameter of the mapping, and the two mappings are 

00 

x = + £ Gn(g) y^/nl, 

1 

oe 

= X 4- £ Xnix)y''/n\, 

1 

the formula expresses Gnix) in terms of X<(x), i < n, and vice versa. 

The second formula is more particularly related to the Poisson summation 
and relates coefficients p„ s p„(g) and q„ s q„{g) in the pair of equations 

00 

a = c'Zl q„ 

0 

00 

C = O PnO~*V«! 

0 

Both formulas, which are necessarily elaborate, are given concise expression 
by the use of the multi-variable polynomials of E. T. Bell. 

2. Introduction. In 1923, in a paper little known in statistical circles, G. A. 
Campbell [2] gave as the basis for his extensive tabulation of the Poisson sununa- 
tion an a^mptotic series expressing the average a in terms of a normal variable 
g, correi^onding to the probability of at least c occurrences, and c itself. That 
is to say, he associated with the Poisson summation 

00 

P(a, c) = Z e-^oT/xl 

e 

a normal variable g, defined by 

417 


c"**" dx 



418 


JOHN RIORDAN 


and inverted the summation (which, as is well known, is equivalent to the in¬ 
complete Gamma function ratio) to give a series for a in terms of g and c. The 
series, which is carried to 11 terms, starts as follows: 


a ~ c 


[ 


1 + c- + c-"* + 


If x = (a — c) c is introduced, this becomes 


36 


36 




and X is seen to be, like g, a standardized variable of mean 0, variance 1. 

It seems to have gone unnoticed that this result includes the x distribution 
through the transformation: 2a = X^, 2c = n and it has been rediscovered by 
A. M. Peiser [7] (4 terms) and by Goldberg and Levine [4] (6 terms). 

It is possible also to express c in terms of a and g, and a formula of this kind 
with fewer terms which appears in a footnote in Campbell’s paper is as fol¬ 
lows: 

.. .. r 1 _i_ S'*+2 1 1 

c o 1 — go H-g— a H-^ 2 — ® + • • • 

Finally there is a third possibility of expressing g in terms of the remaining 
variables, preferably x and c; though unnoticed by Campbell this has since been 
brought to prominence by Cornish and Fisher [3], Hotelling and Frankel [5] 
and KendaU [6]. 

The idea behind the first expansion appears most clearly in the second form 
and is that for c large the variable x behaves nearly like g. The third possibility 
reverses this expansion and gives a function of x and c which behaves like g; 
hence if this function is first evaluated, reference to the normal integral table 
gives an immediate evaluation of the probabilities in question. Put in another 
way, the expansion widens the scope of the normal integral table and for this 
reason has been called “normalization” of the variable (but this term seems pre¬ 
empted by its use in another sense for orthogonal functions, and has been re¬ 
placed in the title by normal variable mapping). 

From the point of view of statistical theory, the three expressions are different 
versions of one relationship, which suggests that there should be generd rules 
for transforming a series of one tjrpe into that of another. The two inversion 
formulas given below supply these rules in what appears to be as compact a form 
as the problem allows. It will be noted that the proofs given suppose convergent 
series, a case which leads to clarity and brevity and is interesting in itself. Ap¬ 
plied to Campbell’s series, they give the known results so far as the latter go, 
but of coiuse for other asymptotic series they need independent verifications. 


3. First Inversion Fonnula. This relates coefficirats in series like Campbell’s 
first and its reverse as in Cornish and Fisher. More precisely 



INVERSION FORMULAS- 


419 


If (ni9)> Giis) are assigned polynomials and if 

(1) * = 9 + 2 On(g)i/'/nl, 

n^l 

defines x in terms of g and a parameter y, then 

(2) fif = a: + 2 XJ,x)y''/n\, 

1 

where 

(3) “Xn(x) = Yn{aGi{x)j a(? 2 (x), • • • , aGn(x)), 

TABLE 1 

Bell Polynomials Yn (fgi , /sf 2 • • • fgn) 

Fi = f,gi 

F 2 = fig2 + f^gl 

Yz = figz + + fzg\ 

F4 = /lff4 + /2(4ff3S'l + 3g^2) + U{^29\) + S^\ 

Fs = Sigh + /2(5^4^i + ^^gzg^ + fz (lO^sg^i + ^^g\gi) 

+ fi + SiQi 

Fc = figz + /2(6g^6ffi + 15^4^2 Hr lO^s) 

+ fz(X^g4gi + 60^30^2^1 + 15 ^ 2 ) 

+ /4(20 m? + 45(72^?) + fB{15g2g\) + fzgi 
F 7 = figi + f2i7gzgi + 215 ^ 59^2 + ^5gAgz) 

+ fz{21gzg I + 1059^4^20^1 + TOglgi + 1059 ^ 3 ^ 2 ) 

+ /i (359^4^1 + 2109^39^29^1 + 1059^29^1) 

+ fhiSSgzgt + I05gkl) + fe(21gzgi) + fjgl 
Fg = figs + /z ( 89 ^ 79^1 + 28^8^2 + 56^69^3 + SSgi) 

+ fz(28gsgi + 1689 ^ 59 ^ 2^1 + 28figAgzgi + 2\{^gAg\ + 28Qg\g^ 

+ /4(56m? + 420^74(72(7? + 280^3^? + 840^1^1 + 105^^2) 

+ /6(709r49^i + 5609^39^2^1 + 420^29^?) 

+ Sh{^^gzg\ + 2109 ^ 25 ^ 1 ) + f7{2Sg2gi) + fsgl 

Yn being the multivariable polynomial of E. T. Bell [1], in the variables Gi(x) to 
Gnix) and the symbolic variable a which is such that 

a* = ai^ (-D)*”', D = d/dx, 

with differentiations on all products of Gi{x) to Gn{x) associated with it in the poly-- 
nomial. 

Note the ^mmetry of x and g^ which allows the transformation to go either 
way, the inverse of (3) being 

(4) -‘Gnig) = Fn(aXi((7), aX^ig) • • • , aXn{g)) 

Table I gives explicit expressions for polynomials Fi to Fg. It will be noted 
that the number of terms in Yn is the number of partitions of n and that/i, the 



420 


JOHN RIOBDAN 


variable replacing a,- in the table, is associated with terms corresponding to 
partitions with i parts; that is to say, if Yn.i designates such terms 

Yn - t.fiYn.i 
1 

The verification or extension of the table may be accomplished by the formulas 
and relations given by Bell (l.c.) or more directly by those modifications of Bell 
given by myself in [8]. 

The first few instances of (3), dropping the common variable x for brevity, 
may be read off from Table I (with appropriate changes of notation and inter¬ 
pretation of Oi) as follows: 

-Xt » Gi 

-X, ~G,- D(G\) 

-Xt - Gt -ZDiGtGi) + D’CG'i) 

-Xt - Gt -4D(G,Gi) - 3D(Gl) -|- 6 Z)*(G,Gj) - D\Gt) 

Applied to Campbell’s first formula in its second form with y » c““* and 

Gi(x) - (x* - l)/3, Gtix) « (-6x^ _ 14** + 32)/270, 

Giix) = (x* - 7x)/18, G«(x) =» (9x‘ + 256x* - 433x)/1680, 

these show e.g. 

_ Y _ — 7x _ 2(x* — 1) 2x _ —7x* -f x 

* 18 3 ■ S’ 18 ’ 

and similarly for the others, resulting in 

Xt « -(x* - l)/3 

Xt = (7x* - x)/18 

X» = -(219x* - 14x* - 13)/270 

Xt = (3993x‘ - l52x’ + 119x)/1680 

These determine a calculation formula for the Poisson summation, which is 
a refinement of the normal approximation. That is to say 

with 

__ X* - 1 7x» - X _ 219x* - 14x» - 13 

^ ® 3Vc 36c 1620cv^, 

3993x* - 152x* + 119x _ 

40320c* ” *** 


and X «» (o — c)fy/e. 



INVERSION VORMULAS 


421 


For the ^■variate, the formula is applied in the reverse direction since Hotelling 
and Frankel supply the first four values of Xn , that is, in present notation, the 
series 

X* + X . 13a:‘ + 8** + 3* y* 35x^ + 19a:* + ** — 15a: y’ 
— >'+ -S- 2-64-6 

, 6271X* + 3224x^ - 102x‘ - 1680x* - 945x y* , 

+ ---3840-M + ■■■ 

The reversed series (obtained by (4)) is 

. V + I6ll' + 3« , Ss' + I9g‘ + 17,,' - I5g ^ 

*-»+ -S-2 + -64-G" 

, 7V + veg’ + img' - 1920j' - MUg y' , 

+ -3840-a + 

The first three terms are checked by Goldberg and Levine (l.c.). 

Another application worth noting is to the formulas of Cornish and Fisher 
which give and X,-(x) in terms of the relative cumulants of the distribution; 
to save space these are omitted. 

The derivation of the formula may be indicated most easily by Lagrange’s 
formula for the expansion of one function in powers of another in the following 
form*: 

Let C be a contour in the complex z plane enclosing the point z = x, and let 
/(«) and ^(x) be analytic on and inside C. Let y be such that 1 y0(x) | < j x — x 1 
when X is on C, and g be that root of the equation: 

(5) P = * + y<l>(9) 
which lies inside C. Then 

(6) Sig) - J-- [ /(*) x {log [x - X - jA^(x)]} dz = fix) + 2 X!(x)y"/n! 

jtti Jc az 1 

where 

(7) Xtix) = [/'(x)(«(x))’‘] 

The contour integral in (6) appears, slightly disguised, as a problem in Whit¬ 
taker and Watson [Afodem Analysis, Cambridge, 1920, p. 149]. The evaluation 
(7) is given for completeness, though no use is made of it in this section, the 
derivation proceeding directly from (6). 

First notice that by (1) and (5) 

-y<l>(g) = 2 Onig)jr/n\, 

1 

^ The author owes the suggestion for this to S. O. Rice, who also simplified the deriTation 
of the second inversion formula given later. 



422 


JOHN RIORDAN 


BO that the logarithm in ( 6 ) may be written 

log (2 - X + Gn(,z)y'/n\), 

1 

or 

log (2 - x) + log [1 + ^ GMiz — xl'V/nij, 


or 

(8) log (2 — x) + log exp hy, 
with h a symbolic variable such that 

6“ s 60 = 1 

6" s 6„ = G„(2)(2 - x)-'. 

Now if 

(9) log (exp hy) = Biy + Biy^/2\ + • • • , 

= exp By, 

B being another symbolic variable, Bo = 0, B” ^ B„, it follows from equation 
(5) of [8] that 

Bn = [Dy log (exp 6y)]y_o, Z>» = d/dy, 

= Yn{0bi , ffbi , 0bn) 

=^t,^iYnAbl,bo, ... bn), 

1 

with /9, = (—)‘~*(t — 1)! and Yn.i the part of polynomial Yn having i parts, 
as defined above. Moreover, each factor bk of terms in Fn., contributes 
Gk{z){z — *)”’ 80 that 

(11) = E ft(z - x)-' YnAGiiA), Goiz) . •. G.( 2 )) 

1 

Then, by (5) 


= fix) - ^ £ /'(*) exp By dz 



INVERSION PORBiULAS 


423 


* i.," /• 

"/w-Etr/ 

I n\ »-i Jc 


{-y-\i -1)! 

2in{z — x)* 


YMiz) ••• (?n(«))f(2) rf2 


= /(*)-1:^Z (-i»‘"‘[f(x)y„.*(Gi(x)... (?,(x))] 
1 nl 1 


with D = d/dx. The evaluation in the last line is by the Cauchy formula for 
derivates; the second line is derived by an integration by parts. 

Equation (4) follows from this and the substitution j{g) = g. 


4. Second Inversion Formula. This gives the interrelations of coefficients of 
series like the two Campbell series mentioned in the introduction. It runs as 
follows: 

If 9ii9)i 9iis) are given polynomials and if 


( 12 ) 


a = 


c±9Mf^ 

n n\ 


defines a in terms of g and a parameter c; then 

(13) c = «f^ 

0 n\ 


where 

(14) -P«(?) = YJfiiqiig), aqilsi), •.. , aq^ig)) 

with a = ai = 1; a* s «,• = (n — 4)(n — 6) . • • (n — 2t)2~‘'‘'* 

^Equation (14) is formally similar to (3) and by symmetry as before, qnlg) is 
readily expressible as a F* polynomial in pi(g) to Pn(g). 

The first five instances of (14), dropping the argument for brevity, are 


-Pi = qi 
—p» “ g* — g* 

-p, = gi - i g,gi + i g* 

-P4 = gt 

•~p» =■ g» + f (g4gi + ^giqi) — f (2gtgt + 3q\qi) 

+ ^g*gt-Hgi 

Applied to Campbell's first series where 

gi(g) = 9 g»(g) “ (g* •- 7g)/6 

g^O/) - f (g* - 1) g4(g) = (-I2g‘ - 28g* + 64)/i35 

gi(g) = (36g* + 1024i/’ - 1732g)/1296 



424 


JOHN RIORDAN 


these show that 

Pi(g) = -g ptig) - (?* + 2g)/i2 

1h(g) = (0* + 2)/3 Pi(g) - (12g* + 2%* - 64)/136 

pi(g) - (207j/‘ + 2596ff’ - 6148(7)/1296 

The proof of (14) is as follows. First, for brevity introduce symbolic variables 
p and q with the usual interpretation p” » Pn(g), q* » qn(g) so that (12) and 
(13) read 

a = c exp q c“* 
c <= o exp p o~* 

Now write o = 1/x*, c = 1/y* changing these to 
X = y (exp gy)~* 
y = X (exp px)"* 

and note that 

(15) xV’ = (exp ?y)“‘ “ exp px 

which shows that p« is the coefficient of xVn! in the expansion in powers of x 
of (exp yy)”’. Lagrange’s formula gives at once (D = d/dy): 

(16) /(y) = Z i)"-‘[/'(y)(exp gy)»»]^ 
so that 

(exp yy)"’ = Z (exp yy)*‘""*’D(exp qy)]y^ 

I ni 

or 

(17) -,.. = ^ll)-(exp «<)"->)« 

= Y„{aqi ,«<?*,•••• ,oty,) 
with a< as in (14), by equation (6) of [8]. 

BIBLIOGRAPHY 

[1] Bell, E. T., “Exponential polynomials,” Annah of Math. Vol. SS, (1934), pp. 258-277. 

[2] Campbell, G. A., “Probability curves showing Poisson’s exponential summation,” 



INVERSION FORMULAS 


425 


Bell System Tech JL Vol. (1923), pp. 95-113; Collected Papers, New York, 
1937, pp. 224-242. 

[3] Cornish, E. A. and Fisher, R. A., '^Moments and cumulants in the specification of dis¬ 

tributions,” Revue de VInatitut Intern, de Stat. Vol. 4, (1937), pp. 1-14. 

[4] Goldberg, H. and Levine, H., “Approximate formulas for the percentage points and 

normalization of t and x*/* Annals of Math. Stat. Vol. 17, (1946), pp. 216-225. 

[5] Hotelling, H. and Frankel, L. R., “The transformation of statistics to simplify their 

distribution,” Annals of Math. Stat, Vol. P, (1938), pp. 87-96. 

[6] Kendall, M. G., The Advanced Theory of Statistics I, London. 1943. 

[7] Peiser, A. M., “Asymptotic formulas for significance levels of certain distributions,” 

Annals of Math, Stat. Vol. (1943), pp. 56-62. 

[8] Riordan, John, “Derivatives of Composite Functions,” Bull. Am. Math. Soc. Vol. 52, 

(1946), pp. 664-667. 



ON THE DETERMINATION OF OPTIMUM PROBABILITIES 

IN SAMPLING 


By Morhis H. Hansen and William N. Hubwitz 
Bureau of the Census 

1. Summary. In a previous paper [2] it was shown that it is sometimes 
arofitable to select sampling units with probability proportionate to size of the 
mit. This note indicates a method of determining the probabilities of selection 
which minimize the variance of the sample estimate at a fixed cost. Some ap¬ 
proximations that have practical applications are given. 

2. Introduction. Neyman has shown that it is possible to reduce the sampling 

trariance of an estimate by dividing a population into sub-populations (called 
strata) and varying the proportions of units included in the sample from stratum 
to stratum [1]. His treatment presumed that the units within any stratum would 
pe drawn with equal probability. In many practical sampling problems, the use 
pf constant probabilities is neither necessary nor desirable. Not only is it possible 
to obtain unbiased or consistent estimates with varying probabilities of selection 
pf the sampling units, but also it is possible to reduce the variance of sample 
estimates by appropriate use of this device. * 

It has been shown [2] that in a subsampling system, the selection of primary 
mits with probabilities proportionate to the numiber of elements included in the 
primary imit may bring about marked reductions in sampling variances over 
sampling with equal probabilities. In this note, we shall indicate a method of 
letermining the optimum probabilities under certain conditions, and also some 
approximations to the optima that have practical applications. 

By optimum probabilities, we mean the set of probabilities of selection that 
will minimize the variance for a fixed cost of obtaining sample results, or alterna¬ 
tively that will minimize the cost for a fixed sampling error. 

8. Optimum probability with a subsampling system. Consider, for example, 
;he simple subsampling system where primary units are first drawn for inclusion 
n the sample and then a sample of elements is drawn from the selected primaiy 
mits. We shall suppose, for simplicity of notation, that the sampling is done with- 
lut stratification. The conclusions indicated below will be similar if stratified 
lampling is used, and they will hold even if only one unit is drawn from each 
stratum. Suppose that a peculation contains M primary units, and that the 
sampl^g of primaiy units is to be done with replacement. Sampling with re- 
)laceii4ent is assum^ in order to simplify the mathematics. We wi^ to estimate 
^he ratio 



426 



OFTIMUH PBOBABILITIES IM SAJUPLTXO 


427 


where Xu and Yu are the values of two characteristics of the jth element within 
the fth primaiy unit, and AT,- is the number of elements in the fth primary unit. 
A consistent estimate of X/Y is given by 


( 1 ) 


^ 1 ^ Y,j 

^ «-l Pt Hi /-I 


where 

Pt = The probability of selecting the ith primary unit on a single draw. 
Tit = The total number of elements included in the sample from the fth unit 
if it is drawn. If a particular unit happens to be included in the sample 
more than once the subsampling will be independently carried through 
each time it is drawn. 

m = The total number of primary units included in the sample. 

It will be assumed that a self-weighting sample is to be used, i.e., that although 
the probabilities of selecting primary units will vary, the subsampling rate 

7t Tt 

within the fth selected primary unit, —, will be such that P, — = k. Note that, 

with this condition, k is the probability that aii^element will be included in the 
sample by making a single draw of a primaiy unit, and by carrying out the speci¬ 
fied subsampling within the selected primary unit. It follows that mkN is the 
expected total number of elements included in a sample of primary units, where 

N = f A.. 

<-i 


The method can be extended to cover situations where other conditions are im¬ 
posed. 

We shall express the variance of r in terms of P,, m, and h, and also express 
the cost in terms of these same quantities. The optimum values of P., m, and k 
will then be determined. 

The variance of the sample estimate. To terms of order 1/w of the Taylor ex¬ 
pansion of a ratio, the sampling variance of the estimate (1) is approximately 


( 2 ) 


s 

Or 


<■■1 X i <«■! Jt i iV ifti 




2 


where 




428 


MORRIS B. HANSBN AND WIDUAM N. HTBIWITZ 


i J , 5* a 

— O’** “f* -JPj (Tiy — 2 iTiaiff 






Jfi 


J — 

(fix — - 


Z {Xu - Xi? 


Ni-l 


ffi 


Z (Yu - fi? 

(f^iv = 


Ni-l 


Ki 


/r. - 

<^» 4 CV — — 


Z Xu - XMYu - fi) 


Ni-l 


The cost function. Now suppose that the total cost of the sampling procedure 
involves a fixed cost attached to each primary unit included in the sample, a 
cost of listing the elements within each selected primary unit (this listing may be 
necessary in order to draw a subsample), and a cost of obtaining information from 
each of the elements selected for inclusion in the sample. Under these circum¬ 
stances the total expected cost of the survey will be: 

( 3 ) C = Cim + Cim'E PiNi + C,mkN 


where 


Cl 

Ca 

C, 


EPiNi 

>•1 

mk 

N 


The fixed cost per primary unit. 

The cost of listing one element in a selected primary unit and other 
costs that vary with the number of elements to be listed, 

The cost of obtaining the required information from one element in 
the sample. 

Expected number of elements in the sample per primary unit in 
the sample. 

The over-fdl sampling ratio, and 

M 

EXi = The total number of elements in the population. 


It will be noted that although the values of P,- and m may be fixed in advance, 

m 

the number of elements to be listed, Z > remains a chance variable. It is for 


this reason that we consider the expected cost rather than the actual cost. 

The optimum vcduea of Pi, m, and k. The values of P< , and k which min¬ 
imize the variance (2) subject to the conditions that: 




OPTIMUM PROBABILITIES IN SAMPLING 


429 


are given by 


(4) 


(5) 


( 6 ) 

where 


Pi = 




mii 


Cl + CiJNT, 


t/; 


mbi 


Cl + CiNi 



Cl + Cl z PiNi + Cjfciv’ 


2 



^ Ordinarily 5,- will be positive although it will often be found to be negative for 
some i. For a great many populations, such negative values can be avoided by 
classifying the primary units into size groups or other significant groups and then 
requiring that the probability of selection be Pa for every primary unit in the 
a-th group. 

In actual practice, however, in advance of designing a sample one does not 
have the data to compute the optima and uses methods of approximating the 
optimum probabilities. Methods of approximating the optimum probabilities are 
given below. 

4. Some rules for approximating the optimum probabilities. In another 
paper [2] considerations were presented from which it follows that tends to 
decrease with increasing size of unit, but seldom as fast as the size of unit in¬ 
creases. The rate of decrease is often small relative to the increase in , and 
empirical data for a number of problems indicate that even the assumption of 
di being fairly constant with increasing size of unit may not lead one far astray 
from the optimum probabilities. Under this assumption (5,- == S for all i) the 
probabilities depend only on i\r<, Ci, and C 2 , and lead to the following results: 

(a) When Ci > 0 and Ct — 0, probability proportionate to size will be the 
optimum. 

(b) When Ci 0 and C 2 > 0, probability proportionate to the square root 
' of the size will be the optimum. 



430 


MORRIS H. HANSEN AND WILLIAM N. HURWITZ 


If we go to the other extreme (extreme not in terms of mathematically possible 
values but in terms of most practical populations), and assume that Si decreases 
at the same rate that Ni increases, the results would be: 

(a) When Ci > 0 and Cj = 0, probability proportionate to the square root 
of the size will be the optimum. 

(b) When Ci = 0 and C 2 > 0, equal probability will be the optimum. 

The minimum is broad in the neighborhood of the optimum and the results for 
either of these extremes and the values in between often will give results reason¬ 
ably close to the minimum. This leads to the following useful approximations: 

(a) When C 2 ^PiNi , the expected cost per primary unit of listing and related 
operations, is small in relation to Ci , the fixed cost per primary unit, the 
optimum probabilities will be between probability proportionate to size 
and probability proportionate to the square root of size, and either of these 
will be reasonably close to the optimum. 

(b) When Ci is small compared to C 2 'SPiNi, the optimum probability will be 
between equal probability and probability proportionate to the square root 
of size, and either of these will be reasonably close to the optimum. 

(c) When both Ci and C' 2 SP,iV,- are of significant size, i.e., when the costs 
vary substantially both with the number of primary units in the sample 
and the size of the units, then probability proportionate to the square root 
of the size will be a reasonably good approximation to the optimum. 

(d) When units of small size are used and all of the subunits in the selected 
primary units arc included in the sample (that is, there is no subsampling) 
equal probability is close to the optimum. It should be noted that this 
rule does not follow directly from the above analysis based on subsampling, 
but from a separate analysis in which no subsampling is involved. 

For whatever system of probabilities is used, and with the cost function given 
by (3), the optimum value of k is given by: 



which can be approximated, in application, from prior experience or preliminary 
studies. The corresponding optimum value, for m is obtained by substitution in 
the cost function. 

The above results should not be accepted, of course, as the optima for every 
30 st function or every sampling system. Either past experimental data may be 
available or pilot tests made to determine the cost function and the appropriate 
Eqiproximations that should be used in various practical situations. 

An illusiration. An illustration may be of interest. A characteristic pub¬ 
lished for city blocks in the 1940 Census of Housing is the number of dwelling 
imits that are in need of major repairs or that lack a private bath. Suppose we 



OPTIMUM PBOBABILITIES IN SAMPLING 


431 


were sampling to estimate the proportion of the dwelling units having this char¬ 
acteristic for the Bronx in New York City, at the time of the 1940 Census, Let 
us assume that once we selected a system of probabilities we used the optimum 
numbers of blocks and the optimum sampling ratios appropriate to these proba¬ 
bilities, that is, the optimum values of k and m. For each of several cost func¬ 
tions the following Table 1 shows the sampling variances of each system, rela- 


TABLE 1 


Unit costs 

Average cost per primary unit of 
listing and related operations 
(CaSP/iVy) 

Variances relative to equal 
probability 

Ci 

c, 

c, 

Equal 

proba¬ 

bility 

Probability 
propor¬ 
tionate 
to square 
root of size 

Proba¬ 
bility 
propor¬ 
tionate 
to size 

Equal 

proba¬ 

bility 

Probability 
propor¬ 
tionate 
to square 
root of size 

Proba¬ 
bility 
propor¬ 
tionate 
to size 

5 

.10 

1 

13.49 

21.15 

27.63 

100 

92 

104 

5 

.05 

1 

6.75 

10.58 

13.82 

100 

88 

97 

5 

.02 

1 

2.70 

4.23 

5.53 

100 

83 

87 

5 

0 

1 

0 

0 

0 

100 

75 

73 

2 

.10 

1 

13.49 

21.15 

27.63 

100 

96 

111 

2 

.05 

1 

6.75 

10.58 

13.82 

100 

93 

106 

2 

.02 

1 

2.70 

4.23 

5.53 

100 

90 

97 

2 

0 

1 

0 

0 

0 

100 

79 

77 

1 

.10 

1 

13.49 

21.15 

27.63 

100 

97 

114 

1 

.05 

1 

6.75 

10.58 

13.82 

100 

96 

no 

1 

.02 

1 

2.70 

4.23 

5.53 

100 

93 

103 

1 

0 

1 

0 

0 

0 

100 

82 

81 

0 

.10 

1 

13.49 

21.15 

27.63 

100 

99 

117 

0 

.05 

1 

6.75 

10.58 

13.82 

100 

99 

115 

0 

.02 

1 

2.70 

4.23 

5.53 

100 

99 

113 


tive to the variance of sampling with equal probability. It also shows values of 
Ci^PiNi for comparison with Ci. 

Some of the costs given in the table do not have unreasonable relationships 
in terms of the situations encountered in practice in various types of jobs. The 
comparisons are not affected by the absolute magnitudes of the costs but only 
by their relative magnitudes. The results are consistent with the rough rules of 
thumb given above. It is worth noting that in each of the above instances prob¬ 
ability proportionate to the square root of the size yields a comparatively low 
variance. 




432 


MORRIS H. HANSEN AND WILLIAM N. HURWITZ 


5. Sampling with or without replacement. In this paper the sampling with 
varying probabilities was assumed to be carried out with replacement which 
ordinarily would not be advisable in practice. When sampling is done without 
replacement the optimum probabilities and their approximations will be about 
the same as for sampling with replacement in at least those instances where the 
proportion of the population in the sample is small. Further investigation is 
needed for large sampling rates. 

6. Conclusion. In summary, it is not essential and may not be desirable to 
give each element in the population (or stratum) the same chance of being drawn 
in order to avoid bias or to have a consistent estimate. Estimate (1) is a con¬ 
sistent estimate no matter what probabilities of selection are assigned to these 
units. The use of variable probabilities of selection is another device to be added 
to those already in the literature, such as stratification and efficient methods of 
estimation, which make it possible to achieve the objectives of a sample survey 
at reduced costs. Reference [2] gives another illustration of reductions in sampling 
variance achieved through the use of varying probabilities in accordance with 
the rules suggested above for approximating the optimum probabilities. 

REFERENCES 

[1] Jerzt Neyman, the two different aspects of the representative method of purposive 

selection,’’ Roy. Siat. Soc. Jour., New Series, Vol. 97 (1934), pp. 558-606. 

[2] Morris H. Hansen and William N. IIurwitz, **On the theory of sampling from finite 

populations,” Annals of Math. Stat., Vol. 14 (1943), pp. 333-362 



A SOLUTION TO THE PROBLEM OF OPTIMUM CLASSIFICATION 

By P. G. Hoel and R. P. Peterson 
University of California^ Los Angeles 

1. Summary. By means of a general theorem, the space of the variables of 
classification is separated into population regions such that the probability 
of a correct classification is maximized. The theorem holds for any number of 
populations and variables but requires a knowledge of population parameters 
and probabilities. A second theorem yields a large sample criterion for deter¬ 
mining an optimum set of estimates for the unknown parameters. The two 
theorems combine to yield a large sample solution to the problem of how best to 
discriminate between two or more populations. 

2. Introduction. There are essentially two basic problems in discriminant 
analysis. The first problem is to test whether the populations differ, since it 
would be futile to attempt a classification if the populations did not differ. The 
second problem is to find an efficient method for classifying individuals into their 
proper populations. In this paper, an optimum asymptotic solution of the 
second problem will be presented. 

3. Parameters known. Let /,* = /»(xi, • • • , x*), (i = 1, • • • , r) denote the 
probability density function of population i in the region under consideration. 
Let Pi > 0, (t = 1, • • • , r), denote the probability that population i will be 
sampled if a single individual is selected at random from that region, and let R 
denote the k dimensional Euclidean variable space. Then the desired theorem 
is the following: 

Theorem 1. If Mi denotes the region in R where pifi > pjfj ,0 = 1, • • • , r), 
and where pifi > 0, then the set of regions Mi , (i == 1, • • • , r), in which any 
overlap is assigned to the Mi with the smallest index, will maximize the probability 
of a correct classification. 

For the purpose of proving this theorem, consider any other set of non¬ 
overlapping regions, Mi , Since the addition to any of the regions Mi of a part 
of R throughout which all the functions /< vanish will not affect the probability 
of a correct classification, there is no loss of generality in assuming that the set of 
regions Mi contains the same portion of 22 as the set of regions Mi does. The rela¬ 
tionship between the two sets may be expressed by means of the formulas 

(1) Mi = E Mii 
and 

(2) M',^EM.y, 

where M<y denotes that part of Mi which is contained in M/. 



434 


P. G. HOEL AND B. P. PETERSON 


Since a sample point that falls in the region Mi will be judged to have come 
from population i, the probability of the correct classification of a single random 
sample by means of the set M,- is given by 

(3) Q — Pi ( fidE + ••• + Pr f frdE , 

JVi JUr 

where dE = dx\dx 2 dx, .11Q' denotes the probability of the correct classifica¬ 
tion by means of the set , 



In the notation of (1) and (2), these probabilities become 

Q = Pi f fldE + + pr f frdE 

JZMji JlMrj 

and 

Q' — Pi f ft dE + • • • + Pf /* fr dE. 

JsMi, 

Now consider the difference Q — Q'. It can be expressed in the form 
Q-Q' = tt[pif fidE - Pi f /yds] 

= a, [ [Pifi - Pif,] dE. 

t-l J-1 JMti 

Since Mu is contained in Mi and pif, > p/y, (/ = 1, • • • , r), holds throughout 
Mi, it follows that each of these integrals is non-negative; consequently Q > Q', 
which proves the theorem. 

This theorem yields a solution to the classification problem only when the /< 
are completely specified and the p,- are known. 

It will be observed that this theorem is similar to a generalization of a funda¬ 
mental lemma in the Neyman-Pearson theory of testing hypotheses [1], and to a 
result by Welch [2]' 

If the basic weight function in Wald’s [3] formulation of the multiple decision 
problem assumes only the values 0 and 1, corresponding to whether or not a 
correct classification is made, it will be found that the set of regions Mi will 
minimize the expected value of the loss in that formulation. 

4. Parameters unknown. Since the p., as well as the parameters in the /<, 
are assumed to be unknown, Q will be a function of such parameters. Let , • • • , 
$, denote all such parameters, including the p.-. Now let a random sample of size n 
be taken from the region under consideration and let 9i, • • * , 9. denote a set of 



OPTIMUM CLASSIPICATIOK 


435 


estimates of the parameters based on this sample. Since the total sample will 
constitute a sample of size ni from /i, nt from/ 2 , etc., where n = ni+ ••• + n,, 
the tf’s for/< will be estimated by means of a sample of size n,- rather than of size n. 
In the following arguments, it will not be necessary to distinguish between 0 ’s 
which are estimated by different size samples because the arguments will be 
based on the order of terms with respect to the size sample and n,- ~ np,- with 
probability one. Or, more simply, choose all equal. 

Let Mi correspond to M,- when the parameters are replaced by their sample 
estimates and let Q denote the probability of a correct classiffcation when using 
the regions Mi in place of the regions M,-. Then, from (3), 

Q - 0 = E Pi [dE - l_ fi d^. 

Let H = Q — Q. Since the estimates, h , are random variables, H will be a 
random variable which is a function of the estimation functions, 0i , as well as 
of the parameters, 0 ,-. The desired criterion for determining optimum estimates 
is then given by the following theorem: 

Thborem 2. If E(0i — 0 ,)‘ = OirT"), g > 0, and if in some neighborhood of the 
point 0, = 0i, (i = 1, • • • , s) the function H is continuous and possesses continuous 
derivatives of the first, second, and third order with respect to the 0 ,-, then 

E(n) = 5 E Z HiiEih - 0<)(0, - 0y) + Oin-^'*’), 

I t-1 j-1 

where H if denotes the partial derivative of H with respect to 6^ and Si at the point 

(01 , • • •, e,). 

The proof is similar to the type of proof used by Cramer [4] to obtain an 
expression for the variance of a function of central moments. 

By means of Tchebycheff’s inequality [4], page 182, it follows that 

P[( 0 i - 9i)* > e*] < 

6* 

From the theorem assumptions, there exists a constant A such that 

P[0i - 0i)* > 


This is equivalent to 

An. ^ 

P[l0i-0<| >*] <^. 


If El denotes the set of points in sample space where 10< — 0,-1 < e, (t => 1, • • •,«), 
and E 2 denotes the complementary set, this inequality implies that 


PfPd < 


sAn * 


(4) 


^4 



436 


P. O. HOEL AND R. P. PETERSON 


The expected value of H may be written in the form 

(5) EiH) = f HdP+ f HdP. 

Jei Jsi 

Consider the order of the second integral. From (4) and the fact that H is the 
difference of two probabilities, it follows that 

If H dp\< [ dP = P[Ei] < 

I Jb2 I ^B2 

Consequently (5) becomes 

( 6 ) E(H) = f HdP + 0(n~'). 

Js, 

Now consider the first integral. From the theorem assumptions, if « is chosen 
suflBciently small, it follows that for any point in the set Ei , the function H 
can be expanded in the form 

H = H(e) + E ih - e,)Hm + ^ E E (§< - em - eMe) + r, 

1 z 1 1 

where B denotes the point (Bi, • •• , B,), where 

ft = S i i ^ 

O 1 ] 1 

and where B' is some point in Ei . Since Q reduces to Q when B = B, H(B) = 0. 
Furthermore, since Q denotes the maximum probability of a correct classification, 
H > 0 for all B; hence ff,(B) = 0 and Hu(B) > 0 for all i. Thus, for any point 
in the set Ei , 

ft = ^ Z Z (0.- - 0.)(5, - B^)ffiAe) + ft. 

jS 1 1 

If this expression is substituted in ( 6 ), ft (ft) will become 

(7) ft (ft) = i Z Z ft«W f (0i - - 0j) dP+ f RdP + 0(n-’). 

J 1 I Jsi Jbi 

Consider, first, the order of the remainder term. From the continuity assump¬ 
tion on Hijk , it follows that Hah is boimded in fti, say | Hijk{B') j < ft; hence 

f (Bi - Bi)(Bj - Bj)(Bk - Bt)Hiiu(B') dP < ft f | (fl< - Bd(h - »>)(«* - 0») I dP. 

I Jbi . Jxt 

By Schwarz’s inequality, 

f I (tf< - BM - - 6k) 1 dP 

Jbi 

< [ (9i - B^\ii - Bif dP (Bk - e*)* dPJ. 



OPTIMXIM CLASSIFICATION 


437 


Similarly, 


0i - dP < [(5.- - OiY dP {Oj - Bj)* dpj‘, 

(fl» - e,Y dP < [ (e, - e,y dP dp]* < [ Q, - BhY dp]‘. 


Since 


f (Bi - Bi)* dP< f (Bi - B,)* dP = 0(n"'), 

Jsi JBt+Jtf 

the preceding inequalities combine to give 


( 8 ) 


L 


RdP 


= Oin-*'*’). 


Now consider the first integral in (7). It may be written in the form 


(9) [ (Bi - 8i)(Bj - Bj) dP = E(Bi - Bdih - e,) - f (B, - BM- - Bj) dP. 

JEi 

By Schwarz’s inequality, 

1 L, ~ I - [ L L 


Similarly, 

f (Bi - B.y dP<f f (Bi - By dP-P[PJ 
Jb2 

If these inequalities are combined and inequality (4) is employed, (9) will 
reduce to 


(10) / (Bi - B,)(Bj - 6,) dP = E(Bi - B,)(h - 0,) + 0(n-'>). 

Jti 

Finally, if (8) and (10) are employed in (7), it will reduce to the result stated 
in the theorem. 

The order of the leading term in E(H) depends upon the nature of the esti¬ 
mating functions, Bi. In order to insure that this term will be the dominating 
term, and thus rule out pathological situations, only that class of estimating 
functions (estimators) will be considered for which this term will be of lower 
order than that of the remainder term. If the estimators are means or central 
moments, for example, then g = 2. For such estimators the order of the remainder 
term is 0(n~*), whereas the order of the leading term is not higher than 0(n~‘). 

A set of estimators will be called an optimum set if it maximizes the expected 
value of the probability of a correct classification, or, what is equivalent, if it 
minimizes E(H). Since only large samples are being considered here, it is neces- 



438 


P. G, HOEL AND R. P. PETERSON 


sary to define optimum in an as 3 anptotic sense. Consider sets of estimators for 
which E(H) is of order 0(n”«). For this class of estimators, a set will be called 
as 3 anptotically optimum if it minimizes 

lim n^E{H). 

Among asymptotically optimum sets of various orders, the set corresponding 
to the highest order would naturally be considered as the best asymptotic set. 
Now from Theorem 2, it readily follows that a set of estimators which minimizes 

<ii) i: i: hh E{h - - oy) 

1 1 

will be an asymptotically optimum set. 

6. Maximum likelihood estimates. If the estimates di are unbiased and uncor¬ 
related, (11) will reduce to 

< 12 ) 

where a\ = E(fii — is a function of n as well as of the parameters. Since, from 
the discussion preceding (7), Hu > 0, it follows that (12) will be a minimum when 
the a? assume their minimum values. Now it is known [4], page 504, that under 
mild restrictions maximum likelihood estimates possess minimum asymptotic 
variances; hence for estimators of the type being considered which also satisfy the 
conditions in [4], the maximum likelihood estimates of the di will yield an 
asymptotically optimum set of estimates for the classification problem. 

REFERENCES 

(IJ J. Neyman and E. S. Pearson, ‘‘On the problem of the most efficient tests of statistical 
hypotheses,** Roy, Soc. Phil. Tram., Vol. 231 (1933), pp. 289-'337. 

[2] B. L. Welch, “Note on discriminant functions,*’ Biometrika, Vol. 31 (1939), pp. 218-220. 

[3] A. Wald, “Contributions to the theory of statistical estimation and testing hypothe¬ 

ses,** Annals of Math. Slat., Vol. 10 (1939), pp. 299-304. 

[4] H. Crabc^r, Mathematical Methods of Statistics, Princeton University Press, 1946, 

pp. 352-356. 



NOTES 

This section is devoted to brief research and expository articles on methology and 
other short items. 

A GENERALIZATION OF WALD’S FUNDAMENTAL IDENTITY 

By Gunnah Blom 
University of Stockholm 

1. Summary. The fundamental identity is generalized to the case of independent 
random variables with non-identical distributions. The conditions for the 
validity of the differentiation of the identity are discussed. The results given in 
[1], [2], and [3] are obtained as special cases. 

2. A property of cumulative sums. Let zi, Zi, • • ■ he an infinite sequence of 
independent random variables, Fi(z), Ft(z), • • • their distribution functions (d.f.) 
and ^(0, fpiit), • * • their moment-generating functions so that <p,{t) = E{e“’). 
an and bn are given constants (ay > bn, N = 1,2, • • •). n is defined as the 
smallest integer N for which Zn = Zi + • •• + 3 y is ^ Cy or ^ 6 y . 

We first give two lemmas. 

Lemma 1 . If two positive quantities 5 and e can be found such that one at least 
of the following conditions a) and b) are satisfied 

o) P(z, > 5) > € for all v and Urn sup Oy < » 
b) P(zp < —5) > €/or all v and lim inf bs > ~ 

N-*oo 


then for any k 0 

(1) lira iV*P(n > W) = 0. 

An inspection of the proof of (4) in [4] shows that this formula holds when the 
conditions of the lemma are satisfied. The lemma follows. 

Lemma 1 can be generalized as follows. 

Lemma 2. If two positive quantities S and e and a sequence Ci, ci, ■ ■ ■ can be 
found such that one at least of the following conditions a) and b) are satisfied 

y 

a) P(z, + c, > S) > tfor all v, lim sup ay < <», lim sup X c, < «, 

b) Piz, + c, < —S) > (for all v, 

lim inf 6y > — 00 , lim inf 2 > “ *1 

^-*•0 1 


then (1) is true. 



440 


OUNNAR BLOM 


Proof: In case a) we put z', = e, + c,, Zh == 2 z’, and o# *• o^r + Sf c, . 
The inequality Zy ^ ay then becomes Z'y’^a'y.ha P{z', > S) > t and lim sup 

Oy < ao, Lemma 1 can be applied to the sequence z'l, zi, , and thus (1) is 
true. When conditions b) are satisfied, the proof is analogous. 

3. The generalized ftmdamental identity. In this section we shall consider 
sequences of random variables of the type defined in Lemma 2 . We shall prove 
two theorems the first of which is valid for complex values of t and the second 
only for real values of t. 

Theorem 1. Assuming that 

1°. one at least of conditions a) and b) of Lemma 2 is satisfied-, 

2°. b ^ by < ay ^ a, where a and b are finite-, 

3°. for some complex {or real) value of t, <py{t) exists for all v and is ^ 0 and 
lim inf | <pi{t)-- -ipAt) \ > 0, 


1 . 


then 

( 2 ) E[e‘^'‘{<pi{t) 

Proof. Let Wm denote the set of all sequences zi Zy in the W-dimensional 
Euclidean space Qy for which n — m (m ^ N), WL the projection of W„ on 2* 
and W’n>y all sequences for which n > N. We have identically 

rZ f + f 1 dFi • dFy = f e‘^^ dF,-- - dFy = <piit) • • • vM)- 

Lm-l •'iTm -tOy 

Dividing by the right member and cancelling common factors we obtain 


(3). 


m—1 


f . 




dF, 


-dFm 

+ {*pi' 


I 


= 1 , 


When iV —> 00 the first sum tends to the left member of (2). We thus have to 
investigate the last term in (3) which we denote by Ry . We can write 


(4) 


Rn = (^1 




JZif 


Wn>y 


dFi • • • dF If 

= (^1 ••• <py)~^Pin > N)En>He- 
• 0. As 6 < Zy < a by 2® we conclude 


jzy 


It follows from Lenuna 2 that P{n > N) 
that Ry —* 0. This proves the theorem. 

Theorem 2. If, for some real value of t, <p,{t) exists for all v and if quantities 
c,,e>0 and S > 0 can be found such that at least one of the following conditions 
a) and b) are satisfied for all v 

N 

o) lim sup Oat < 00 , lim sup £ Cr < <» and 


JV-^oo 


iV-»00 


(5a) 


Ay(t,S) = ^.r e“dFy(z)>e, 
<P,{t) Jt-t, 


{p * 1, 2, •••), 



wald’s fundamental identity 


441 


6) lim ioibir > — lim inf 2-f c» > “ *> and 

N-*90 1 

(5b) BXt, S) = -L f *""' (.' = 1, 2, . • •), 

then (2) holds. 

The conditions of the theorem become more attractive if the theorem is 
limited to the somewhat less general cases mentioned in the Corollary below. 
The above formulation has been chosen mainly because of an important applica¬ 
tion to identical variables in Sec. 6. 

Proof. The theorem is proved if we can show that Ey in (4) tends to zero when 
N . For that purpose we use the transformation (cf [5] and [3]) 

(6) t) = £ e" dF,iz), (f = 1, 2, • • • )• 

Gr(«; 0 is obviously a d.f. for every real t (for which <p,{t) exists). When (5a) 
holds, 

P[z, c, > i I G,{z\ <)] = A{t, 5). 

Here the expression in the left member denotes the probability that z, -|- c, > 5, 
when G, is the d.f. of 2 ,. 

Consequently, when conditions a) are fulfilled, a sequence of random variables 
with the d.f:8 Gj( 2 ; t), Gi{z] t), • • • or, with one notation, G(t) satisfies the con¬ 
ditions a) of I.iemma 2. It follows that 

lim P(n > iV i G{t)) = 0. 

Introducing G,{z] t) in Ry we find 

Ry=( dGi-‘-dGy = Pin >N\Gm 

Consequently Ry —* 0. When conditions b) are fulfilled, the proof is analogous. 

Corollary to Theorem 2. If 1° <p,it)e‘"’ ^ Hit) < <», 2° t is positive and 
conditions a) of Lemma 2 hold or t is negative and conditions b) of Lemma 2 hold, 
then the generalized fundamental identity is true. 

For, in the first case 

JU-o,) 

so that (6a) is satisfied, and similarly when t is negative. 

The following special case deserves particular attention as it covers most 
cases occurring in practice and the conditions become very simple: If a sequence 
of random vatidbles satisfies conditions a) and b) of Lemma 1 simultaneously, a 
sufficient condition for (he validUy of (2) for some given reed value of t is that the 
sequence <p,it) is bounded. 



442 


GUNNAR BLOM 


4. Application to Poisson variables. As an application of (2) we consider a 
sequence of Poisson variables with the parameters \m,, where X is a positive 
quantity and m, are positive integers. From the well-known formula 

^,(0 « 


we easily conclude that the conditions of Theorem 1 are valid if f?(e‘) ^ 1. (With 
5 < 1 in (5a) we find that (2) holds even for negative t .) If, in particular, we 

2irik 

choose / so that c‘ = 1 -1 —— — Ck, we have the simple formula 

S(c^) = l, (&-l,2, •••). 


6. Differentiation of the generalized fundamental identity. In this section t 
is assumed to be real. We denote the fcth derivative of ^,(<) by We shall 

prove the following theorem which corresponds to Theorems 1 and 2. 

Theorem 3. // for all t in a closed interval I the conditions stated in Theorems 

ipy(t) 

with respect to both v and t (in I) for k — 1,2, • • • r, then the generalized funda¬ 
mental identity may he differentiated r times with respect to tfor any tin the interior 
of I. 

We use a method of proof which is similar to that used in [2). We first show 
that the sum in (3) may be differentiated r times under the integral signs and 
secondly that the rth derivative of Rn tends to zero uniformly in t when N —* ». 

The rth derivative of the general term of the series in (3) consists of a finite 
number of terms of the form 


are uniformly bounded 


1 or 2 are satisfied and if, in addition, the functions 


JM) = (<Pi--- dFi • •. dF„ (m ^ X;M,X = 1,2, •.. r), 

and the rth derivative of Ry in (4) consists of a finite number (which does not 
depend on N) of similar expressions with N substituted for m and W»>w for 
win. Hn is a, sum of m** and W'* terms respectively which is symmetric in v. 

The terms are functions of (fc ^ X; r = 1, 2, • • • m) and are thus maior- 

<p>(f) 

ated by the same constant C. 

Further, we can always find a positive quantity U such that for all f in J 


Hence 


1 g ^ (e*"*" + e-*"*") 


JM) I g (v>i • • • Cm^ f , (e*'*" + e-'*'") dFi... dF„ 

Jw- 


The rest of the proof is divided into two parts corresponding to the conditions 
of Theorem 1 and those of Theorem 2. 



wald’s fundamental identity 


443 


When the conditions of Theorem 2 are fulfilled we make the transformation 
(6) in (7) with t — U and t = — fe . Then 

1 Jm(t) I ^ CmTiPin = m 1 G{U)) + P(« = w | (?(-fo))] ^ 2Cm'‘ < oo. 

This justifies the differentiation of the series in (3). 

Substituting N for m and n > N iat n — mm the above expression we further 
have 

I JnH) I ^ > N 1 (?(<o)) + P(n > W I G(-<«))], 

and conclude from Lemma 2 with k — ^ in (1) that tends to zero imiformly 
in t. It follows that the rth derivative of Rn also tends to zero uniformly in t. 

In the second part of the proof we assume the conditions of Theorem 1 to be 
satisfied. We then write (7) in the following form 

(8) 1 I ^ C(,^ • • • ^„rVP(n = 

where signifies the conditional expectation when it is known that n = m. 
From the definition of n it follows that, when n = m, we have 6m-i < Zm-i < 
Om-i and Zm ^ a„ or ^ b„ . Hence 

^ I Z„ ^ aj = I ^ a„] 

^ i z„ > o„ - 6„_il < «>. 

The second exponential can be treated in a similar way. Thus Jm(0 is majorated 
by a finite expression. 

FinaUy, we substitute N for m and n > N for n = m in (8). 7 being a closed 
interval it follows from condition 3° in Theorem 1 that we can find a constant 
C such that 

1 MO 1 g CN'‘P(n > + c" 

From the definition of n and condition 2® in Theorem 1 we have b < Zy < a. 
An application of Lemma 2 then shows that Jy(0 tends to zero uniformly in t. 
This proves the theorem. 

Corollary to Theorem 3. When the conditions stated in Corollary of Theorem 2 
are fulfilled for all t in the closed interval I, Theorem 3 is true. 

This is obvious. 

6. The fundamental identity for identically distributed variables. In the 
special case of identically distributed variables for which P(z = 0) < 1 and 
0 < ^(0 < 00 we infer from Theorem 1 that the fundamental identity 

( 9 ) = 1 

holds if t is complex and | ^(0 | ^ 1. This is the case discussed in [1]. 

Further, when P(« •= 0) < 1, the integrals f e*’dF and f e''dP cannot both 



444 


BBOCKWAT MCMILLAN 


be zero for every a > 0 and < 0, and thus we infer from Theorem 2 that the 
fundamental identity holds for all real i (if the limits ay and by are chosen in 
accordance with the conditions of this theorem). This proposition is somewhat 
more general than that proved in [3] by a similar method. 

It also follows from the last remark and Theorem 3 that, when P(z => 0) < 1, 
(9) can be differentiated any number of times for any real t. This proposition 
contains the results in [2] and [3] as special cases. 

7. A generalization. We finally remark that the assumption made in Theorem 
3 that the expressions containing derivatives of ^,(t) are uniformly bounded is 
unnecessarily restrictive. For example, it seems possible to prove that the first 
derivative of (2) may be obtained by differentiation under the expectation 
sign if the series (cf. Corollary 1 to Theorem 7.4. in [61) 


r-1 




is iinifonniy convergent with respect to L 


REFERENCES 

[1] A. Wald, cumulative sums of random variables,” Annals of Math, Stat,, Vol. 15 

(1944), p. 286. 

[2] A. Wald, ^Differentiation under the expectation sign in the fundamental identity of 

sequential analysis,” Annals of Math, Stat,^ Vol. 17 (1946), pp. 498-497. 

[3] G. E. Albebt, ”A note on the fundamental identity of sequential analysis,” Annals of 

Math, Slat,, Vol. 18 (1947), pp. 593-696 and Vol. 19 (1948), pp. 426-427. 

[4] C. Stein, ”A note on cumulative sums,” Annals of Math. Stat,, Vol. 17 (1946), pp. 498- 

499. 

[5] H. Cram£r, Sur un nouveau theor^me-limite de la th^orie des probabilit^s, ActmliUs, 

scientifiques et industrielUs, no. 736, Hermann et Cie., 1938, p. 5. 

[61"J. WoLFOWiTZ, ”The efficiency of sequential estimates and Waldos equation for se¬ 
quential processes,” Annals of Math. Stat., Vol. 18 (1947), pp. 228-229. 


SPREAD OF MINIMA OF LARGE SAMPLES 

By Brockway McMillan 
Bell Telephone Laboratories, Murray Hill, N. J. 

1. Theorems. Let x have the continuous cumulative distribution function 
F(x). Let (xi, • • • , be a sample of N independent values of x and y = 
irdixi, ••• , xn)- Then y is a random variable with the cumulative distribution 
function 

(1) Oy(y) - 1 - (1 - F(y))\ 

Let K values of the new variable y be drawn, (yi, • • • , yy) and let the spread 
w = sup (yi,' • •, tfy) - inf (yi, • • • , ym). 



SPREAD OF MINIMA 


445 


Fixing K, we consider the cumulative distribution function of w, Ps{w), as 
N —* 00 . That is, we have K large samples of z and wish to examine the spread 
among their minima. It is evident intuitively that if F(x) =» 0 for some finite x, 
these minima are bounded from below and will cluster near the vanishing point 
of F(x), making w —* 0 statistically bs N —* ». Our theorems also show that 
even when y — oo statistically, i.e., when F(x) » 0 for no finite x, the spread 
w —* 0 statistically if the tail of F(x) is sufficiently small (e.g. Gaussian). On 
the other hand, if F(x) = 0(e**) as x —► — <», the distribution Pm(.v>) does not 
peak a& N —* 00 , while for larger tails (e.g. algebraic) w statistically. 

Two simple theorems are 

I. IS 


lim 

X -*—to 


Fix) 

Fix + «) 


1 , 


then 


lim P»(«) *■ 0. 


II. Let 8 > 0. If 

P(xo) •« 0/or some Xo > — »,orif 


(ken 


lim 

OO 


F(*) 

Fix + «) 


0 . 


lim Pj»(8) =• 1. 


Theorem I is directly applicable to distributions with algebraic tails, theorem II 
to Gaussian tails. We prove them both as corollaries of the more general results: 
III. If 


then 

IV. Let 8 >0. If 

then 


lim inf \ “ I 

X-.— Fix + s) 


lim sup Pjir(s) < (1 ” 1) 


K-l 


^-♦00 


F{x) = 0 for no finite x and 

V T 


liminfPj^(8) > - e"“l* 


N-*te 


for any a > 0. 



446 


BBOCKWAT MCMUJAN 


Theorems III and IV together show that an exponential tail (,F(x) = 0(e**)) 
leads to a Pk(w) which, asymptotically, is bounded away from 0 for any «? > 0 
and bounded away from 1 for w sufficiently small. 

2. Proofs. Explicitly, for any « > 0, 

(2) Pjv(s) = K r I0j,(x + a) - dG^x + «)• 


Turning now to III: given s > 0, choose xi == iiCe) so that (i) F(xi) 0, and 
(ii), X < xi implies 


(3) 


F(x) 

F(x + s) 


> I - 


f. 


We then rewrite (2) as 


(4) 



Gy(x) 
Gn(x + s) 


-IK-l ^«> 

dGy(x + «)*+/ . 
J •'*1 


Treating Gy{x + s)* as the independent variable, the first integral may be 
evaluated by the mean value theorem in the form 


(5) 



GM n 

Gy(Xt + S)J J~» 


dG}t{x + s)* 


< 


Gy{Xif) * 

O/iiXy + S)J 


with an appropriate Xi = Xt{N), — « < Xj < Xi. 

Using the form (2) of the integrand in the second term of (4), we may bound 
the latter by 


( 6 ) 


^00 

K I dGy(x + s) < E:[1 - Gy(xi + s)], 
>'*1 


since 


Gy(x + S) — Gk(x) < 1. 


Now, by factoring (1), 

. Gn(x) _ F(x) 1 + Q + • • • + Q'*~^ ^ F(x) 

’ Gy(x + s) F(x+ 8)1 + Q. +■" + - F(x + s) 


where Q = 1 — Fix), Q, = 1 — F(x + s) < Q. Combining (3), (4), (5), (6), 
and (7), 

Pyia) <[l-l+ <]"* + K[1 - Gyixt + 8)]. 


Since P(xi + «) > P(xi) > 0, we have 

lim Gxixt + «) “ 1. 
.^^“•00 


lim sup Psis) < [1 — 1 + e]*”* 
«-*«0 


Hence, 



SPBEAD OF MINIMA. 


447 


and III follows by letting « —> 0. Then I follows immediately with I — I, when 
we note that Pjir(«) > 0. 

To prove IV, choose any a > 0. By hypothesis, for sufficiently large N we 
may always find Xir =“ xsiot) such that 

(8) F{x^)^^. 


By hypothesis, and the monotonicity of F(x), xjv—»• — ooasV—For any 
« > 0, therefore, we can find No = Vo(a, «) such that N > No implies 


(9) 


FM < L 
F(Xy + «) ~ 1 — « 


oc 

or Fixjt + s) > j^(l — e). Directly from (2), since s > 0, 


Pk(s) > K / [G„{x + s)- dG^ix + s) 

> K / [G^ix + s) - dGnix + 9). 

Jtn—t 

But this last integral is of the form 


f KiU - dU^ (U- (?)*, 

whence 

^ [Gjv(a;jv + s) — 


or 


(10) P^(s) > [(1 - F(x^)f - (1 - Fix„ + 8))'^]*. 

By (8) and (9), therefore 


P^s) > 



, a(l - e)\ 

(V“|iC 

LV v) ( 

at ; 



Since this holds for all AT > No(a, e), 

lim inf Pnis) > [e““^ - 

AT-♦00 


This last, in turn, now holds for any e > 0, hence 

lim inf Pk{8) > [e"“‘ - e"1'. 

^-♦00 

This now holds for miy a > 0. Maximizing on a yields a sharper bound than the 
result of rV. The applicable part of II follows, when L = 0, by letting a 
That the conclusion of II holds when F(xo) = 0 for some finite xo follows from 
(10) with xif replaced by some xi such that F(xi) = 0, F(xi + «) > 0. 



448 


EDGAR REICH 


ON THE CONVERGENCE OF THE CLASSICAL ITERATIVE METHOD 
OF SOLVING LINEAR SIMULTANEOUS EQUATIONS^ 

Bt Edgar Reich 
Magsachusetla Institute of Technology 

The classical iterative method, or Seidel method, is a scheme for solving the 
system of linear algebraic equations 

n 

2 —hi (i = 1, 2, • • •, n). 


by successive approximation, as follows: 

If = (x{'’, xa’\ • • ■ , xi'’) is the »»th approximation of the solution, the 
(v + l)st approximation, x*''*'*^ = (xi'‘'’‘\ Xj'^”, • • • , xl'"^*’), is obtained from 
the relations 


fAiiXi'^’’ + AijXj'’ + AisXa”’ + • • • + Ai«xi’^ = 6i, 

AjiXi'''^*' + AjjXj'''’*' + AtipCt^ + • • • + ^ = bj , 

•'AsiXj'^*^ + AjaXj"^*^ + AgaX*’^*^ + • • • + ^3»x» ^ = bt , 

• • • 

AniXs’"^*^ + AnaXj’'*’'^ + + • • • + — bn , 

xi'"^*^ being obtained from the first equation, then xj”'^’^ from the second, and 
so on. 

The given system can be written in matrix notation as Ax = b where A is 
a non-singular square matrix of order n, and x and b are column vectors of order n. 
Let us define square matrices Ai and A 2 as follows: 


(Aih = 


iAi)ij = 


iA{j ifi>j 

[0 if t < j 

iAij "lit <j 

[0 if f > j 


(Note that Ai + At = A.). 

With this notation the Seidel method can be written as the matric difference 
equation 

Aix*"*’” + Ajx^'^ = b. 


Now various writers, among them C. E. Beny in this journal, (See list of refer- 


* Work done under Office of Naval Research Contract N6ori60. 



COKVERQENCE OF ITERATIVE METHOD 


449 


ences at end of this paper.) have shown that a necessary and sufficient condition 
for convergence, i.e., a necessary and sufficient condition for 

lim (x/'’ - Xi) = 0, (i = 1, 2, ••• , n), 

is that 

(1) A\ has an inverse; that is ^ 0 for any t. 

(2) The characteristic roots of (Ar*.4i) all have an absolute value smaller 
than unity. 

It would be advantageous to rephrase the above condition, if possible, in terms 
of simpler requirements on il. As a step in this direction the following theorem 
is offered: 

Theorem. 7/ A is a real, symmetric nth-order matrix with all terms on its main 
diagonal positive, then a necessary and sufficient condition for all the n characteristic 
roots of (Ar'Aj) to be smaller than unity in magnitude is that A is positive 
definite. 

Proof. Let 2 ,- be a characteristic vector of (A?’ A*) corresponding to the 
characteristic root Mi • Then 

(1) (Ai'Aj) Zf ™ HfZj . 

Premultiplying by It'Ai, where the apostrophe and bar denote transposition 
and conjugation respectively: 

(2) z'iA^j >= Uji'iAiZj . 

Consider the bilinear form s'iAzj . 

We have 

(3) ZiAzj " ZfAiZj “}” ZiAz^j — (1 “b Mi) ZiAiZj . 

Interchanging i and j: 

(4) ZjAzi = (1 + iti)ZiAiZi . 

Taking the conjugate: 

(5) z'fAzi = z'iAzj = (1 + iii)z'jAiZi = (1 + M,)liAi2>. 

Let D be the diagonal matrix with elements 

(6) Dij = Aijdij . 

This makes AJ = 7) + Aj. 

Substituting this in (5): 

(7) z'tAZ) “ (1 + M<) (s'iDzj + iiAtZj) = (1 + Ji^ZiDzi + (1 + M,)Mi2<Ai2y. 
Eliminating S,-Ai 2 y between relations (3) and (7) we obtain 

(8) (1 — fiiiti)SiAz) •» (1 + fii) (1 + UiWiDzi. 



450 


EDGAR REICH 


To obtain the necessary condition we use the fact that we must have 1 Mi j < 1, 
and can therefore rewrite (8) as 

(9) ZiAzy = ^ Zi DZj = Z) (1 + M,)i5j(l + Hf)lt)z'iDZj. 

1 — MtAV it-o 

m 

If a: = 5^ CiZi is any linear combination of the m <n independent characteristic 

i^\ 

vectors of (^47^^42) then 

Ci Zi ) ^4 ( ~ 22 ^'3 ^3 

i-1 / \i-I / t.,-1 

m eo 

= 2 CiCj^ (1 + /Z,)/2<(1 + n,)ijl‘iZiDzj, 

i,j^l k^Q 



or 


QO 


x'Ax - ^y'kDyk 

A*—0 


where 


yk 22 ^’(l *4“ a^Om* • 

Since by hypothesis Au > 0, 7) is evidently positive definite, and therefore 

(11) a:'.4a: > 0. 

In case the characteristic roots pi*, (7 = 1,2, • • • 7t), are all distinct there will be n 
independent Zi assured, and in that case (11) implies that .4 is positive definite. 
Consider, on the other hand, the case where the /x,- are not all distinct. Note 
that (a) the definiteness properties of a matrix are not changed by sufficiently 
small alterations in the elements; (b) the m’s depend continuously on the elements 
of A\ (c) the discriminant of (1) is a polynomial in the A^ that does not vanish 
identically.^ It follows that A must be positive definite even in the case of re¬ 
peated roots because an arbitrarily small change in A will separate any multiple 
/*’s, still keeping them smaller than unity in magnitude, and not changing the 
definiteness properties of A, 

This completes the proof that the condition given in the statement of the 
theorem is necessary. Now to prove sufficiency: 

Setting i = j in relation (8) vre obtain 

(12) (1 - I I "YziAzi = 1.1 + Mi 1 %Dzi 

Since both A and D are positive definite 

(13) ZiAzi > 0 and ZiDzi > 0. 

* The fact that the discriminant is not identically zero follows from easily constructible 
counter-examples. 



RECURBENCE FORMULAE 


451 


Moreover, we cannot have m — —1 because that would mean by (3) that 

0 = Z(AiZi + Z(A2Zi = ZiAzi. 

Relation (12) thus implies 

(14) 1 - 1 /ii 1 * > 0 

i.e. I iu, 1 < 1 as was to be proved. 

The part of the theorem giving the suflBcient condition was already obtained 
by L. Seidel [1] and G. Temple in a somewhat more indirect fashion. 

REFERENCES 

[1] L. Seidel, “t)ber ein Verfahren die Gleichungen, auf welche die Methode der kleinsten 

# Quadrate ftihrt, sowie lineare Gleichungen iiberhaupt, durch successive Anna 

herung aufzulusen,’^ Abhandlungen der Malhematiach-Physikalischcn Classc dcr 
Kdniglich Bayerischen Akademie der Wissenschaften, Vol. 11 (1874), pp. 81-108. 

[2] C. E. Berry, “A criterion of convergence for the classical iterative method of solving 

linear simultaneous equations,’’ Annals of Math. Slat., Vol. 16 (1945), pp. 398-100. 

[3] L. Cesari, “Sulla risoluzione dei sistemi di equazioni lineari per approssimazioni suc¬ 

cessive,” Rassegna delle Paste, dei Telegrafi e dei Telefoni, Anno 9 (1931). 

[4] L. Cesari, “Sulla risoluzione dei sistemi di equazioni lineari per approssimazioni suc¬ 

cessive,” Reale Accademia Nazionalc dei Lincei, Serie 6, Classe di Szienze fisiche, 
maiemaiiche e naiurali, Rendiconii, Vol. 25 (1937), pp. 422-428. 

[5] J. Morris, The Escalator Method in Engineering Vibration Problems, Chapman and Hall 

Ltd., London 1947, pp. 63-70. 

[6] K. J. Schmidt, “On the numerical solution of linear simultaneous equations by an 

iterative method,” Phil. Mag., Ser. 7, Vol. 32 (1941), pp. .369-383. 


SOME RECURRENCE FORMULAE IN THE INCOMPLETE BETA 

FUNCTION RATIO 

By T. a. Bancroft 

Alabama Polytechnic Institute 

1. Introduction. It is well kno^vn that the incomplete beta function ratio, 


defined by 


(1) 


where 


(2) 

R*(P, q) = f a:’’"*(l - x)' 

Jo 

and 


(3) 

B{p, q) = Blip, q), 



452 


T. A. BANCROFT 


is of importance in probability distribution theory, and, hence, also in obtaining 
exact probability values in making tests of statistical hypotheses. In constructing 
certain extensions [1] of Karl Pearson’s “Tables of the Incomplete Beta-Func¬ 
tion” [2], the recurrence formulae contained in the following sections were de¬ 
rived. 

2. Derivation of formulae. The incomplete beta function, Bx{p, q) may be 
considered as a special case of the hypergeometric series, F(a, b, c, x), thus 

(4) B,(p, q) P(p, 1 - g, p + 1, x). 

The series converges for | x j < 1, if and only if o -f 5 < c. By setting a = p, 

6 = 1 — g, and c = p -b 1, as in (4), all conditions are satisfied, if we also take 

g > 0. 

Recurrence formulae for F{a, h, c, x), e. g., in the work of Magnus and Ober- 
hettinger [3], may now be directly converted for use w'ith Bx(p, g) or /*(p, g). 
In particular, using the three identities on page 9 of [3], with x replacing «, we 
have 

(5) cF{a, b, c, x) -b (6 — c)F(a -b 1, 6, c -b 1, x) 

— 6(1 — x)F(o “b 1, 6 “b 1, c -b 1, x) = 0, 

(6) c(c — ox — b)F(a, b, c, x) — c(fi — b)F(a, b — 1, c, x) 

-b c6x(l — x)F(o -b 1, 6 -b 1, c -b 1, a;) = 0, 

(7) cF(a, 6, c, x) — cF(a, 6 -b 1, c, x) -b oxF(o -b 1, 6 -b 1, c -b 1, x) =» 0, 

with a = p,b — I — q, and c = p -b 1, we obtain in turn 

(8) x/,(p, g) - 7*(p -b 1, g) + (1 - x)/,(p -b 1, g - 1) = 0 

(9) (p + g - px)h{p, q) - qlxip, g -b 1) - p(l - x)/,(p -b 1, g - 1) = 0 

(10) g7,(p, g -b 1) -b p7,(p + 1, g) - (p + g)7,(p, g) = 0. 

Formula (8) is the basic recurrence formula used in the construction of Karl 
Pearson’s [2] tables. Formula (10) was obtained, incidentally, by the author [4] 
in a different connection and manner. 

Formulae (8), (9), and (10) may now be combined to give other useful formulae, 
e. g., 

(11) g7,(p -b 1, g -b 1) + (pTgx - g)7*(p -b 1, g) - (p + g)x7,(p, g) = 0, 


(12) p7»(p -b 1, g + 1) + (g — p + gx)7»(p, g + 1) 

- (p + g)(l - x)Ix (p, g) = 0, 



RECXJHRENCE FORMULAE 


453 


(13) (p + g - l)x7,(p - 1, q) 


- (p + q - lx + p)It(p, q) + pixip + 1, g) = 0, 

(14) (p + g)(l - x)Ix(p + 1, q - 1) 

- {(p + g)(l - *) + g}/x(p + 1, g) + pixip + 1, g + 1) = 0. 

Notice that the sum of the coefficients is always zero. 

By a repeated use of (10) it is possible to obtain the formulae 

1 


(15) 


hip + n,q) = 


hip, q + n) = 


ip + n — !)<"> r-O 

■ (p + g + n 


Z (-1)' 

!)(»-)(g + r - l)‘'>/.(p, q + r), 


Ei-iy 


(16) 


(g + n — 1)‘"> r-o 

• (p + g + » - l)‘'‘'"'’(p + r - I)"’ hip + r, q), 

where (p + g + n — 1)^"“’’’, etc., refer to the factorial notation, e. g., 

[p + q + in - I)]'""’"' = (p + g +^n - l)(p + g + »-2)'--(p + g +r). 

3. An application. Formulae (15) and (16) may be used to write general 
formulae for obtaining values of hip, g) where p or q may be greater than 50, 
i. e., for such values outside the range of Karl Pearson’s tables. In particular. 


7,(50 + n, g) = 


(17) 

and 

(18) 


1 ■ 

49 + n)<"> 


n + g +49)'"7,(50, g) 


-(”)g(n + g +49)'’‘-'7.(50,g+1) ••• (-l)’‘(g + n-l)''‘7,(50,g + n) 


hip, 50 + « = 


1 


(49 + n)(»> 


[ 


(n + P + 49)^"T*(p, 50) 


-(”)p(n + P + 49)'’‘-”7,(p+ 1,50) • • • (-l)’‘(p + n - l)'"7.(p + n,50) j. 

It should be noted for (17) that as n increases the range of values that can be 
obtained outside Karl Pearson’s tables are reduced since the last term of (17) 
contains 7,(50, g + n). A similar obser\’ation is noted for (18). From a practical 
standpoint the computational labor restricts n to fairly small values. Using (17) 
we may easily compute for example, 

7.m(52, 48) = 7.«,(50 + 2, 48) 

1 


(51)(50) 


[(99)(98)7.«o(50, 48) - 2(99)(48)7.«o(50, 49) + (49)(48)7 m(50, 50)]. 



454 


T. A. BANCBOBT 


Substituting the necessary values from Karl Pearson’s tables we calculate 

/.«(62, 48) = .9465248. 

Similarly using (18) we may calculate 

/.4o(48, 52) = .0534752. 

As a check on the computations, we use the well-known identity 

^*(P, ?) = 1 - h-xiv', 3'), 

where p' = q and q' = p. Then 

/.4o(48, 52) = 1 - 7.m(52, 48) 

= 1 - .9465248 
= .0534752. 

In like manner formulae (15) and (16) may be used to write general formulae 
for obtaining half values for p or q greater than 10.5, i. e., for values not in¬ 
cluded in Karl Pearson’s tables. In particular, 

/.(10.5 + n,,) - [(9.5 + 5 + n)«/,(10.5, g) - (j) 

(19) 

•3(9.5 -H 3 + n)<’’-”7.(10.5,3 + D • • • (-1)’’(3 + n - 1)‘”'7.(10.5,34- n)J, 

and 

Up, 10.6 + n) - [(9.5 + P + «)‘”Up, 10.5) - (”) 

(20) 

• p(9.5-kp-t-»)‘’‘“*’7,(p-f 1,10.5) • • • (-l)'‘(p-f n- l)‘’‘’7,(p-l-n,10.5)J. 
Using (19) we may compute 

7.m(12.5,8) = -^j^^,1(19.5)«’7.,o(10.5,8)-2(8)(19.5)7.m(10.5,9) 

+ (9)(8)7.m(10.5,10)], = .4512367. 

Similarly using (20) we obtain 

7.40(8, 12.5) = .5487633. 

Employing the check formula, 

7.40(8, 12.5) = 1 - 7.m(12.5, 8) 

= 1 - .4512367 


= .5487633. 



A THEOREM BY WALD AND WOLFOWITZ 


455 


Thanks are due to Dr, J. C. P. Miller, Technical Director, Scientific Com¬ 
puting Service, Limited, London, England, for helpful suggestions in the prepara¬ 
tion of this paper. 


REFERENCES 

[1] T. A. Bancroft, “Some extensions of the incomplete beta function tables.“ (in prepara¬ 

tion) 

[2] Karl Pearson, Tables of the Incomplete Beta-Tunction^ Cambridge University Press, 

1934. 

[3] Wilhelm Magnus i nd Fritz Obeuiiettinger, Formeln and Satze fur die Spezicllen 

Funkiionen dcr Mathemalischen Physik, Julius Springer, Berlin, 1943. 

[4J T. A. Bancroft, “On biases in estimation due to the use of preliminary tests of signifi¬ 
cance, Any\nls of Math, Stai.y^^ Vol. 15 (1944). 


ON A THEOREM BY WALD AND WOLFOWITZ 


By (ioTTFRir.D K. Noetukk 


.Vw York Univerdty 

Let = (/ii, * • ’ , hn), {n = 1,2, • ♦ sequences of real numbei*s and for 
all n denote by the symmetrical function generated by hY ••• AiT, 

i.e., . = 2 K\ • * * Atm where the summation is extended over the n(n ~ 1) 

• • • (n — m 4“ 1) possible arrangements of the m integers n , • • • , t*« , such that 
1 < ij < n and ij 4,0, A = 1, • • • , w). According to Wald and Wolfowitz 
[1] the sequences S^n are said to satisfy condition TU, if for all integral r > 2 


- i: (A* - ly 

n t-i _ 

- Z (A, - nr 

_n 1-1 J 


0 ( 1 )/ 


where "h — l/nh ,li h,. 

Given sequences 31„ = (ai, • • • , a„) and !l)„ = (di, • • • , d„), consider the 
chance variable 


Ln = diXi + • • • + dnXn > 


where the domain of (a:i, • • • , *») consists of the n! equally likely permutations 
of the elements of 2l» . Then it is shown in [1] that if the sequences and 
satisfy condition W, the distribution of L® = {Ln — ELn)/<r{Ln) approaches the 
normal distribution with mean 0 and variance 1 as jj —+ «. These conditions 


^ The symbol 0, as well as the symbols o and ^ to be used later, have their usual meaning. 
See e. g. Cramer [2, p. 122]. 



456 


GOTTFRIED E. NOETHER 


for a^mptotic normality can be weakened. It will be shown that the following 
theorem holds: 

Theorem. L» is asymptotically normal mth mean 0 and variance 1 provided the 
sequences 3)n satisfy condition W while for the sequences ?ln 

Z (a< - &r 

(1) = 0(1), (r = 3, 4, • • •). 

[Z (ai - a)*J 

We note that L® is not changed if at is replaced by [1/nS .-li (a,- — — d) 

and di by [1/n S ,2i (d,- — — d). Therefore it is sufficient to prove 

asymptotic normality provided 

(2) Di = 0, A = n, A = 0(n), (r = 3,4, • • •): 

(3) Ai = 0, Ai - n, A, o(n''^), (r = 3,4, • • •)• 

Then 

ELn = DiExi = 0, 
var L„ = EL» = DtEx] + DnExiXt 

= I A + - A) - n, 

and it is sufficient to show that n~''^ELn tends to the rth moment of a normal 
distribution with mean 0 and variance 1. 

Now we can write 

Ut = n~''^ELn = n''* Z ••• Z Edi^Xi^ ••• di,Xi^ 

H-1 »r-l 

(4) = n"''® [DrExl + • • • + c(r, Cl, • • • , ExV • • • ai” 

+ ••• + Di...iExi •••X,] 

where ci + • • • + Cm = r with Ci , (fc = 1, • • • , m), positive integral and the 
coefficient c(r, Ci, • • • , Cm) stands for the number of ways in which the r indices 
*1, • • • , t, can be tied in m groups of size Ci, • • • , c„ , respectively, so as to 
produce the terms of Dn-.e^ExV • • • xtT. 

Since ExV • • • ai" ~ we have 

(5) n-"®Ai...^£^x;‘ .. • ai" ~ = B(r, Ci, • • • , e«), say. 

Lemma. B{r, Ci, • • • , c„) ~ 0 unless 

(6) m = r/2, Ci = • • • = Cr/j = 2. 

In that case B{r, 2, • • • , 2) ~ 1. 

Before proving this lemma we shall show that our theorem follows immedi¬ 
ately. By (4) Hr is the sum of a finite number of expressions B{r, Ci, • • • , c„). 



A THBOBBM BY WALD AND WOLPOWITZ 


457 


Therefore if r = 28 + 1, (a = 1, 2, • • •)» Ms»+i ~ 0, since at least one of the e*, 
(k — 1, • • • , m), in all the B(2s + l,ei, ••• ,em) adding up to nu+i must be odd. 
If r = 28, Hu ~ c(28 , 2, • • • ,2). Since the first index in (4) can be tied with any 
one of the other 2s — 1 indices, the next free index with any one of the remaining 
28 — 3 indices, etc., it is seen that m 2. ~ (2s — l)(2s — 3) • • • 3. However these 
are the moments of a normal distribution with mean 0 and variance 1. This 
proves the theorem. 

Proof of Lemma. Define A(ji, ,jh) = • • ■ A,-,,. Then A,^...t„ is the 

sum of a finite number of expressions A(ji , • • • where thej ^, io — , h), 

arc obtained from Ci, • • • , e™ by addition in such a way that 

( 7 ) ji + • ■ • + jk = Cl + ■ ■ ■ + e„ = r. 

Since by (3) Ai = 0, we need only consider those A(ji, ■ • • ,jh) for which 
> 2, (gr = 1, • • • , h). If someij > 2 by (3) and (7) 

( 8 ) = 

Ifi» ® 2, 

(9) A(2, ••• ,2) = 

This last case can only happen if r is even and c*, (fc = !,•••, m), equals either 
1 or 2. Therefore, unless (6) is true 


( 10 ) 


m > r/2. 


Similarly, writing as a sum of products of the kind D,-, • • • D,* it is 

seen that by (2) 


(11) 

^ fO(«”) 

^.. = lo(n-) 

Thus by (8)-(ll) 


(12) 


unless (6) is true. In that case 

(13) 

A,..., ~ Al'^ = n"\ 

(14) 

A...2 ~ A"* = 


if wi < r/2 
if m > r/2. 


(12)-(14) together with (5) prove the lemma. 

Let Cl, 02, • • • be independent observations on the same chance variable V. 
We may ask what conditions have to be imposed on the distribution of Y to 
insure—at least with probability 1—that condition (1) is satisfied. Wald and 
Wolfowitz state in Corollary 2 of [1] that provided Y has positive variance and 
finite moments of all orders the Oi, 02, • • • satisfy condition W with probability 
1 and therefore insure asymptotic normality of Ln provided the sequences 2)i, 
satisfy condition W. On the other hand, it can be shown that the Oi, a«, • • • 



458 


Z. W. BIRNBAUM AND F. C. ANDREWS 


satisfy condition (1) with probability 1, provided V has positive variance and 
a finite absolute moment of order 3. Thus condition (1) constitutes a considerable 
improvement over condition W. 


REFERENCES 

[1] A. Wald and J. Wolfowitz, Statistical tests based on permutations of the observa¬ 

tions,” Annals of Math. StaL, Vol. 16 (1944), pp. 358-372. 

[2] H. Crami^r, Mathematical Methods of Statistics, Princeton, 1946. 


ON SUMS OF SYMMETRICALLY TRUNCATED NORMAL RANDOM 

VARIABLES 


By Z. W. Birnbaum and F. C. Andrews^ 
University of Washington, Seattle 


1. Introduction. Let A% be the random variable with the probability density 


( 1 . 1 ) 


iCe for I a: I < a 

|o for I X 1 > a, 


obtained from the normal probability density e by symmetrical trunca¬ 
tion at the ^‘terminus’’ \ x\ = a, and let be the sum of m independent sample- 
values of Xa . We consider the following problem: An integer m >2 and the real 
numbers A > 0, € > 0 are given; how does one have to choose the terminus a 
so that the probability of | j > A is equal to e, 

(1.2) P(| S^^^ I > A) = €? 


This problem arises for example when single components of a product are 
manufactured under statistical quality control, so that each component has the 

length Z == fc + X where X has the probability density and the final 

product consists of m components so that its total length S is the sum of the 
lengths of the components. We wish to have probability 1 — € that S differs 
from mk by not more than a given A. To achieve this we decide to reject each 
single component for which | Z — fc | = | X | > a; how do we determine a? 

The exact solution of this problem would require laborious computations.* 
In the present paper methods are given for obtaining approximate values of a 
which are ''safe’’, that is such that 


(1.3) P(\ 1 > A) < €. 


^ Research done under the sponsorship of the Office of Naval Research. 

* A similar problem has been studied by V. J. Francis [2] for one-sided truncation; he 
actually had the exact probabilities for the solution of his problem computed and tabulated 
for m “ 2, 4. 



TRUNCATED RANDOM VARIABLES 


459 


In deriving these safe values, use will be made of theorems on random variables 
with comparable peakedness, for which the reader is referred to a previous 
paper [1]. 


2. The safe value ai. For fixed a > 0, wc consider the normal random variable 
Ya with expectation 0 and with probability density gaiYa) such that ga{0) — /o(0). 
It is easily seen that Ya has the standard deviation 


( 2 . 1 ) 


_1_ /•+» 

\/2jr i-« 


-«*/* 


dt, 


and that {?o({) < /o({) for 1 { | < o, jr„(f) > 0 = /o({) for ] $ | > a. Hence, applying 
Theorem 1 in [1], we conclude that 


( 2 . 2 ) 


p(Us: 


(m) 


- V2w 1. 






If m. A, and e are given, we determine from tables of the normal probability 


integral so that 
equation 

(2.3) 


V'2' 


L f 

2tJ(, 


e dt = t, set <Ta — 




in (2.1), and solve the 


_^ 


S,Vm y/m 




for o using again tables of the normal probability integral. In view of (2.2) this 
solution satisfies (1.3) and hence is safe; it will l)e denoted by Oi. 


3. The safe value o-.. A direct application of Theorem 2 in [1) yields the 
inequality 


(3.1) 


p(i sr I > A) 


1 


2’" Hn\ i(m+A/a)<i^>i 




for 0 < A < ma. Hence by equating h„(A/o) to e and solving for o, wo obtain a 
safe value which will be denoted by oj. It is of interest to note that (3.1) is true 
not only for/o(x) defined by (1.1) i.e. truncated normal, but/or any probability 
density fa(x) which is symmetrical and unimodal, since these are the only assump¬ 
tions needed for Theorem 2 in [1]. 


4. Solution for large m. The random variable Xa has the variance 


(4.1) 

where 


a*(Xa) = 1 + 


2.t,"{a) 

2 ^(o) — 1 


4>i.x) = 


-L' 


r 


dt. 



460 


Z. W. BIRNBAXJM AND P. C. ANDREWS 


Hence, according to the central limit theorem, we have the approximate equality 


p(i -sr’ I > ^1) = 


=r 

itt J (A I fix a 


for m suflBciently large. 

It can be reasonably expected that the cumulative distribution of differs 
from its limiting normal probability integral by leSs than the cumulative distri¬ 
bution of the sum of m independent uniform variables in (—a, +o) differs 
from its limiting normal probability integral. Already for m = 4 the cumulative 
distribution of {/i”' differs from the corresponding normal cumulative by less 
than .0075. Equally good or better approximation may, therefore, be expected 
for the distribution of so that the error in the approximate equality (4.2) 
between the two-tail probabilities should be less than .015 for m = 4, and still 
less for wi > 4. 

Equating the right-hand term of (4.2) to e and solving for o^iXa), we obtain 


aHXa) = 1 + 


W’ja) 
2<t>(a) - 1 



an equation which can be solved for a with the aid of tables of 4>ix) and ^"(x). 
We denote this value of a by ai. 


6 . Use of the different solutions in practice. From the foregoing it appears 
that the following procedure may be followed in solving our problem in any 
definite case: 

If m is large, ai is very close to the exact solution of (1.3) and may be used 
safely. 

If m is not large but m > 5, it is conjectured that ai is such that the left-hand 
term in (1.3), for a = on , differs from € by less than 0.015. 

If OT < 4, the larger of Oi and c* should be used. Table I contains the A for 
which oi and ot have the same value, say a'; Oi or oj should be used if the given A 
is greater or smaller, respectively, than the tabulated value. The value Ox is 
easily computed from a table of the normal probability integral by the procedure 
of section 2. The value oj can be obtained by reading off A/ot from Table II. 

TABLE I TABLE II 


Values of A for which oi aj — a' for given m, « Vdltiea of A/at for given m, c 



.001 4.568 2.357 5.446 2.008 6.152 1.842 .001 1.937 2.712 3.339 
.002 4.258 2.228 5.059 1.918 5.717 1.779 .002 1.911 2.637 3.213 
.005 3.808 2.047 4.512 1.799 5.111 1.697 .005 1.859 2.507 3.011 
.01 3.438 1.910 4.074 1.712 4.632 1.640 .01 1.800 2.379 2.824 
.02 3.034 1.765 3.614 1.630 4.131 1.589 .02 1.718 2.217 2.600 
.05 2.456 1.581 2.970 1.533 3.425 1.529 .05 1.563 1.937 2.240 








A CUMULATIVE FUNCTION 


461 


6. Examples. 1) ^ = 3.8, m = 4,« = .05. Since A is greater than the value 
3.425 in Table I, we compute oi = 2.162. From Table II we would obtain 
A/oj = 2.240 and thus oj = 1.696 < ai. 2) A = 3, m = 4, t = .02. Since A < 4.131, 
we read .4/02 = 2.600 from Table II and obtain 02 = 1.153 which will be greater 
than 0i. 3) .4 = 5, m = 30, e = .05. Using the method of section 4 we obtain 

ai = 1.62. 


REFERENCES 

[1] Z. W. Birnbaum, “On random variables with comparable peakedness,” Annals of 

Math. Slat., Vol. 19 (1948), pp. 76-81. 

[2] V. J. Francis, “On the distribution ot the sum of n sample values drawn from a trun¬ 

cated normal population,” Roy. Slat. Soc. Jour. Suppl., Vol. 8 (1946), pp. 223-232 


A CERTAIN CUMULATIVE PROBABILITY FUNCTION 

By Sistek Mary Agnes Hatke, O.S.F. 

Si. Francis College, Ft. Wayne, Indiana 

Graduations of empirically observed distributions show that the cumulative 
probability function F{x) = 1 — (1 -f is a practical tool for fitting a 

smooth curve to observed data. The graduations are comparable with those 
obtained by the Pearson system, Charlier, and others and are accomplished 
with simple calculations. Given distributions are graduated by the method of 
moments. Theoretical frequencies are obtained by evaluation of consecutive 
values of F{x) by use of calculating machines and logarithms, and by differencing 
NF{x). No integration nor heavy interpolation is involved, such as may be 
required in graduation by a classical frequency function. Burr [1] constructed 
tables of vi, <r, as, and 04 values for the function F(x) for certain combinations 
of integral values of 1 /c and 1 /k. In these tables curvilinear interpolation must 
be used in finding an F(x) with desired moments. The writer constructed more 
extensive tables for the same cumulative function with c and k a variety of 
real positive numbers less than or equal to one, such that linear interpolation 
can be used to determine the parameters c and k for an F(x) that has as and 
04 approximately the same as those of the distribution to be graduated. These 
tables have been deposited with Brovm University. Microfilm or photostat copies 
may be obtained upon request to the Brown University Library. 

The writer used the definitions of cumulative moments and the formulas 
for the ordinary moments vi, <r, as ,.4)nd 04 in terms of cumulative moments 
as developed by Burr. These latter moments were tabulated for the function F(x) 
having various combinations of parameters c and k, c ranging from 0.050 to 0.675 
and k from 0.050 to 1.000, each at intervals of 0.025. Within these ranges only 
those combinations of c and k were used which yielded oa of approximately 1 or 
less and 04 values of 6 or less, since" such moments are most common in practice. 

It can be verified that over most of the area of the table aa values obtained 



462 


SI8TBB MART AONBS HATKE 


by linear and by curvilinear interpolation on k (or on c) differ by less than 0.001 
and values of 04 by approximately 0.01 or less. If cti = constant and 04 « constant 
curves are plotted on c, k axes, it will be seen that there exists only one solution 
(c, k) of the equations <*3 = B(c, k) and 04 = C(c, k). Furthermore, some 04 
curves intersect two as curves representing the same | as |. Thus the chance of 
finding an appropriate function F(x) for graduation is increased since by reversal 
of scale an F(x) with a positive as may be used to graduate a distribution with a 
negative as, and conversely. 

Graduation of an observed frequency distribution is easily accomplished. 
Linear interpolation on k for a fixed c seems to be the best method for determining 



Fig. 1. The aj, S chart for the Pearson system of frequency curves and the area covered 
by /(*) “ 1 — (1 + (subscript L — bell-shaped) 


the parameters of an F{x) that has as exactly the same and 04 nearly the same as 
the observed as and 04. If the observed as and 04 are fairly close to an entry 
in the table, no interpolation is required. Direct linear interpolation is used to 
determine Pi and <r for the c and k just found. Letting M and S be the mean and 
standard deviation of the given distribution, the formula, 

X — vi_^_X — M 
__ _ 

is used to translate the class limits X of the given distribution to the correspond* 
ing a;’8 of F{x). For any x that is negative the quantity 1 H- x*^‘ is taken as one 
















A CUMULATIVE FUNCTION 


463 


to make F{—x) = 0 in accordance with the definition of F{x) [ 1 ]. The values of 
(1 + x^'‘)~^"’ for the various *’8 are computed by logarithms and differenced to 
obtain the probabilities for the given class intervals, according to equation 

P(a< X fix) dx =Fib) - Fia). 

Ja 

The respective theoretical frequencies are these probabilities multiplied by iV, 
the number of cases. 

The headings that proved satisfactory for the columns of the graduation 
work-sheet are: class intervals (in observed physical units), X (u unit class- 
interval is used),/o 6 ., a;, 1 + N/(l + and/r^ . 

The relation of F{x) to the Pearson system of frequency curves is presented in 
Figure 1, which is a reproduction of a major part of Craig’s chart for al and 
8 [2]. In this chart the parameters of the twelve Pearson curves are expressed in 
terms of al and 6 , where 5 = (2 0:4 — Saa — 6 )/(q :4 + 3). Values of al and 8 
were computed for F{x) = 1 — (1 + in which c and k were assigned 

the values listed in the a^ , ai table. The dotted area superimposed on the Craig 
chart is that covered by these al , 8 values for F(x). Although it is small in size 
compared to the total area, it contains a part of the areas representing the three 
main Pearson curves, I, IV, and VI, as well as the point for the normal curve 
and part of the line on which lie the points corresponding to the bell-shaped 
curves of the Type III functions. It also includes transitional T^’pes V and VII. 
Thus the function F{x) covers part of an important area on the al , 8 chart for 
the Pearson curves. 

The function F(x) was used to graduate satisfactorily several observed dis¬ 
tributions classified as Pearson types, including the three main Types, I, IV, and 
VI, and transitional Types III and VII. 

One advantage in the use of this cumulative function F{x) is that it takes but 
one symbolic form witli the area covered, whereas the Pearson-system curves 
reciuire several different expressions of various complexity requiring identification 
of type. Furthermore, graduation by a Pearson function generally involves 
approximate integration or heavy interpolation in the incomplete beta function 
tables for the evaluation of the integrals of the Pearson functions, whereas 
graduation by a function F(x) is easily and quickly performed since F(x) only 
involves two number-parameters readily determined by means of the az , at 
table and straight arithmetic. 

The writer is deeply indebted to Professor Irving W. Burr of Purdue Uni¬ 
versity for valuable suggestions in this study. 

REFERENCES 

[1] I. W. Burb, ‘‘Cumulative frequency functions,** Annals of Math. Stat, Vol. 13 (1942), 

pp. 215-232. 

[2] C. C. Craig, “A new exposition and chart for the Pearson system of frequency curves,’* 

Annals of Math. Stat., Vol. 7 (1936), pp. 16-28. 



ABSTRACTS OF PAPERS 

(Presented at the Berkeley Meeting of the Institute, June 16-18, 1949) 

1. Eirtension of a Theorem of Blackwell. E. W. Baranklv, University of Cali¬ 
fornia, Berkeley. 

It is proved that Blackwell’s method of uniformly improving the variance of an un¬ 
biased estimate by taking the conditional expectation with respect to a sufficient statistic, 
is, in fact, similarly effective on every absolute central moment of order s ^ 1. The method 
leads to finer detail concerning the relationship between an estimate and its thus derived 
one. (This paper was prepared with the partial support of the Office of Naval Research.) 

2. On the Existence of Consistent Tests. Agnes Berger, Columbia University., 
New York. 

Let 21? (i8) denote the space of all probability-measures defined over a common Borel- 
field i8. Let \m] =» M, {m'l = M' be two disjoint subsets of 2)?(iB) and let Ho (Hi) be the 
hypothesis stating that the unknown distribution is in M (M'). In Neymam’s terminology 
Ho can be consistently tested against Hi if to any preassigned e > 0 there exists an integer 
n and a critical region in the product-space of n independent observations such that the 
probabilities of the errors of the first and second kind corresponding to this region are 
simultaneously smaller than e. A sufficient condition which for a certain type of consistent 
test is also necessary is established. The condition is satisfied whenever the disjoint sets 
M and M' are closed and compact with respect to a certain suitable topology introduced 
on 2)?(©). Thus for instance Ho can be consistently tested against Hi if M and M' contain 
only a finite number of measures or if the measures in M resp. 3/' depend continuously on 
a parameter ranging over a closed and bounded subset of some Euclidean space. 

3. Effect of Linear Truncation in a Multinormal Population. Z. William Birn- 
BAUM, University of Washington, Seattle. 

Let (X, Ti , 72 , * • • , 7„-.i) have a non-singular n-dimensional normal probability 
density/(X, 7i , F 2 , • • • , 7„_i) for which all parameters are given, and let ^(X, Yi , 72 , 

• • • , 7n-i) be the probability density obtained from / by truncation along a given hyper- 
plane:^ = C/for 0i7i •+• • • • + a»-i7«-i ^ aX -f- ^ * 0 elsewhere. What is the marginal 

distribution of X for this truncated distribution? This question can be answered by using 
a set of tables with only two parameters. These tables make it also possible to solve prob¬ 
lems such as: determine the plane of truncation so that the marginal distribution of X has 
certain required properties. (This paper was prepared under the sponsorship of the Office 
of Naval Research.) 

4. Statistical Problems in the Theory of Counters. (Preliminary Report). Colin 
R. Blyth, University of California, Berkeley. 

The assumptions made about counter action and distribution of incident particles are 
the same as those of B. V. Gnedenko [On the theory of Geiger-MUller counters, Journ, Ex- 
per, i Tear. Phiz^ Vol. 11 (1941)]. The distribution of the number X of particles registered 
during a given time (0, t) is found explicitly, in terms of the density a(v) of incident par¬ 
ticles at time v. The problem considered is that of estimating the parameters of a( 2 ;). For 
the special case a(v) =■ a, the distribution of X reduces to PtX ■■ xl * a*(t — xr)* exp 

464 



ABSTRACTS OF PAPERS 


465 


J-a(( — *r)}/*l + exp {-o(t- xt)1S*IJo<[< - xrJVt! - exp{-o[< - (x - l)Tll2fIl a*[t - 
(X - 1)t]V*I for X - 1,2, • • • , 8 - " 01 = PIX - s +1| « 1 - expl-a^ - 

f 

«r)) — 8rY/i\; P{X > « + 1 | *■ 0. This distribution has been found in another 

problem by J. Neyman [On the problem of estimating the number of schools of fishy submitted 
to Statistical Series, Univ. of Calif, press]. For this special case the maximum likelihood 
estimate d of a is found to be given by dr exp (dr) =-(!-{- T/{t ~ XT)\^XT/(t — xr). If 
r/it — xt) is small, as will usually be the case, d will be close to the estimate x/{t — xr) 
usually used for a. 

5. Some Two-Sample Tests. Douglas G. Chapman, University of California, 
Berkeley. 

Let X, Y be random variables normally distributed with means ri , variances <ri , 0-2 
respectively. The two sample procedure formulated by Stein to obtain a test with power 
independent of < 7 , for the hypothesis»? = is used here to determine a test for the hypothesis 

- *» r (r any pre-assigned real number). The size and power of this test are independent of 

v 

<71 and < 72 . The two sample procedure may be extended to the more general case of testing 
the hypothesis of equality of means of several normal populations, the variances being 
unknown. Approximate tests are obtained for this case. Finally it is shown that this two 
sample procedure can be used to select that normal population, of several, with the greatest 
mean: the rule of selection having a preassigned level of accuracy. (This paper was pre¬ 
pared with the partial support of the Office of Naval Research.) 

6. Minimum Variance in Non-Regular Estimation. R. C. Davis, U. S. Naval 
Ordnance Test Station, Inyokern. 

The Cramdr-Rao inequality for the minimum variance of a regular estimate of an un¬ 
known parameter of a probability distribution is extended to a broad class of non-regular 
types of estimation. The theory is developed only for the case in which a probability den¬ 
sity function and a sufficient statistic for the unknown parameter exist. For every non¬ 
regular estimation problem included in the above class, it is proved that there exists a 
unique unbiased estimate which attains minimum variance, and a method is given for 
obtaining the sample estimate. Examples are given; such as, the rectangular distribution, 
a class of truncated distributions, etc. 

7. Auxiliary Random Variables. Mark W. Eudey, California Municipal Statis¬ 
tics, Inc., San Francisco. 

In testing hypotheses concerning discontinuous random variables it is not possible to 
find regions of arbitrary size, and so if we compare two critical regions, selection between 
them on the basis of the usual criteria of the Neyman-Pearson theory of testing hypotheses 
may be confused by the difference in their sizes. This difficulty may be avoided by allowing 
the statistician to use a mi.xed strategy in such cases, and make his decision to accept or 
reject the hypothesis depend upon an independent auxiliary random variable. For example, 
if K is a binomial variable, and U has a uniform distribution (0,1), then Z ^ KU may 
be used to test hypotheses concerning the binomial parameter, and regions of any size may 
be found. For the binomial case this procedure leads to a class of uniformly most powerful 
tests for one-sided alternatives, and to uniformly most powerful unbiased tests for two- 



466 


ABSTRACTS OF PAPERS 


sided alternatives. Similar results are obtained for other common discontinuous variables^ 
and the same device may be used in considering confidence regions and decision functions 
for such variables. (This paper was prepared with the partial support of the Office of 
Naval Research.) 

8 . Estimation in Truncated Samples. Max Halperijst, The Rand Corporation, 
Santa Monica, California. 

A death process is considered which starts with n individuals of zero age, each following 
the mortality law,/(a;, 6). That is, 

F(t) = Pr lAge at death < /} = f f(x, 0) dxj 

Jo 

where f(x, 0) is a probability density. We suppose we truncate the process at a fixed time, 
and wish to estimate 0 when 

a) individuals who die are not replaced, and 

b) individuals who die are replaced by individuals of zero age following the mortality 
law,/(a;, 0). 

In both cases, it is found that, under mild conditions, estimation by Maximum Likeli¬ 
hood gives optimum estimates. The estimates are best in the sense of being asymptotically 
normally distributed and of minimum variance for large samples. 

The proofs are given for the case of a single parameter, but can be extended to the multi¬ 
parameter case. Examples are given. 

9. Some Problems in Point Estimation. J. L. Hodges, Jr. and E. L. Lehmann, 
University of California, Berkeley. 

Some point estimation problems are considered in the light of Waldos general theory. It 
is shown that when the loss function is convex, one may restrict consideration to nonran- 
^omized estimates based on sufficient statistics. Minimax estimates are obtained in a 
number of cases connected with the binomial and hypergeometric distributions, and with 
some non-parametric problems. Some prediction problems are also considered. (This paper 
was prepared with the partial support of the Office of Naval Research.) 

10. Completeness in the Sequential Case. E. L. Lehmann and C. Stein, Uni¬ 
versity of California, Berkeley. 

Recently, in a series of papers, Girshick, Mosteller, Savage and Wolfowitz have con¬ 
sidered the uniqueness of unbiased estimates depending only on an appropriate sufficient 
statistic for sequential sampling schemes of binomial variables. A complete solution was 
obtained under the restriction to bounded estimates. This work, which has immediate 
consequences with respect to the existence of unbiased estimates with uniformly minimum 
variance, is extended here in two directions. A general necessary condition for uniqueness 
is found, and this is applied to obtain a complete solution of the uniqueness problem when 
the random variables have a Poisson or rectangular distribution. Necessary and sufficient 
conditions are also found in the binomial case without the restriction to bounded estimates. 
This permits the statement of a somewhat stronger optimum property for the estimates, 
and is applicable to the estimation of unbounded functions of the unknown probability. 

11 . The Ratio of Ranges. Richard F. Link, University of Oregon, Eugene. 

The distribution of the ratio of two ranges from independent samples drawn from a 
normal population is given analytically for ni and n s ^ 3. A table of percentage values, R, 



ABSTRACTS OF PAPERS 


467 


is given for a .005^ .01, .025, .05, .10 and for all combinations of n\ and 712 up to 10, where 
« » Pr { w \/ w % > R ) and wi and w % are the observed ranges. (This paper was prepared under 
the sponsorship of the Office of Naval Research.) 


12 . Some Problems Arising in Plant Selection and the Use of Analysis of 
Variance. Stanley W. Nash, University of California, Berkeley. 

The yields of many (m) varieties are compared in a field trial. A few varieties having the 
highest and lowest yields in this trial are selected for further testing. What chance is there 
that the first trial will give a significant result, the second trial not? Let denote the true 
mean yield of the tth variety, and assume that the are themselves normally, independently 
distributed with variance <r\ . Let Pk(k*» 1,2) denote the probability of a significant result 
in the /:th trial, using the P-test. For fixed a\ > 0 , limm-»» Pi — 1. (See Nash, Annals of 
Math. Stat., Vol. 19 (1948), p. 434.) Now let > 0 take on a decreasing sequence of values 

as m increases. If ; - « 0 f- J, then limm-»<« Pi — 1. Here 1 -f a\g{m) » 


^(numerator of F) 
<rj (“ error variance) 

-0^ ^ 


7ig(m) 


Also lim«—„ P 2 < I if and only if 


Wlogm/ 


For 


( - 7 ;i=r I ,Iim«-.«P 2 « a, the level of significance used. Thus, corresponding to any 
V log m/ 

m, however large, one can find values of trl for which the chances are considerable (or even 
approaching 1 — a), that the two field trials will give opposite conclusions when the P- 
test is used. 


13. Asymptotic Properties of the Wald-Wolfowitz Test of Randomness. Gott¬ 
fried E. Noether, Columbia University, New York. 

Let ai , • • • , Un be observations on the chance variables Xi , • • • , . Wald and Wolfo- 

witz {Annals of Math. Stat., Vol. 14 (1943), pp. 378-388) have shown how the statistic Rh — 
2?«i XiXi+h 9 (Xn+/ =■ Xj), can be used to test the null hypothesis that the X,*, (i = 1, • • • , 
n), are independently and identically distributed by considering the distribution of Rh in 
the subpopulation of all permutations of the a,. In the present paper it is shown that when 
the null hypothesis is true this distribution of Rh is asymptotically normal provided 
(o* — d)*’/[2"_, (a< — (r = 3,4, • • • ), a condition which is satisfied 

with probability 1 if the a< are independent observations on the same chance variable X 
having positive variance and a finite absolute moment of order 4 + 5, (5 > 0). Conditions 
are given for the consistency of the test based on Rh when under the alternative hypothesis 
observations are drawn independently from changing populations. In particular a down¬ 
ward trend and a regular cyclical movement are considered, both for ranks and original 
observations. For the special case of a regular cyclical movement of known length the 
asymptotic relative efficiency of the rank test with respect to the test performed on original 
observations is found. It is shown that when using ranks, Rh is asymptotically normal 
under the alternative hypothesis provided lim inf»-.« var(n*'*/*/?fc) > 0. This asymptotic 
normality of Rh is used to compare the asymptotic power of the Rh-test with that of the 
Mann T-test (Econometrica, Vol. 13 (1945), pp. 245-259) for the case of a downward trend. 

14. On the Similar Regions of a Class of Distributions. Stefan Peters, Univer¬ 
sity of California, Berkeley. 

The class of distributions considered is essentially the class of those distributions of n 
variables which, by a suitable transformation of the variables and the parameter, can be 
transformed into distributions defined in the whole Rn for which the par^imeter is a location 



468 


ABSTRACTS OF PAPERS 


parameter. These regions satisfy a certain partial differential equation. The transformed 
distributions of the variables , • • • yn and parameter r possess a class Di of similar 
regions with respect to t w’hich can be defined as the smallest additive class of regions 
which includes all regions defined by 

^[(2/1 - 2 /«), , (y« + Vn)] ^ C 

where ^ is a continuous function. The class Di does not exhaust all similar regions. There 
exists among the regions of class Di one which is most powerful for testing a given addi¬ 
tional parameter <t. If there exists among all similar regions a most powerful region for 
testing <T, then that region will be the most powerful region of class Di . 

15. Some Problems in Sequential Analysis. Charles M. Stein, University of 
California, Berkeley. 

Wald’s fundamental identity for cumulative sums is extended to dependent random 
variables. The first derivative of this at the origin is equivalent to a result of Wolfowitz 
{Annals of Math, Siat., Vol. 18 (1947), p. 228, Th. 7.4). Higher derivatives of this at the 
origin can also be obtained from linear combinations of Wolfowitz *8 result applied to suitable 
products of the original random variables. These equations yield approximate OC and ASN 
curves for probability-ratio tests for a simple hypothesis against a single alternative con¬ 
cerning some of the more usual stationary Markoff chains. Bounds for the amount by 
which the ASN exceeds that of the most efficient test are also obtained. The results are 
applied in particular to random variables taking on only the values 0, 1 with conditional 
probabilities depending only on a finite number of the preceding observations. The case 
of linear dependence of normal random variables with fixed conditional variance is also 
considered. 

16. Some Aspects of Links Between Prediction Problems and Problems of 
Statistical Estimation. Erling Sverdrup, University of Oslo. 

A prediction is not taken as a probability statement about additional observations of 
The random variable already observed. It is presumed that the statistical interpretation 
of the sample will result in some action influencing the random variable subject to predic¬ 
tion. The probability distribution of this random variable is given for each of an a priori 
class of probability functions for the observed random variable and for each of a class of 
possible actions. ‘‘Utility” as a function of the random variable to be predicted and of the 
action is defined. It is shown that the problem of which action to take in order to maximize 
expected utility is identical with a problem of statistical inference with a uniquely defined 
weight function in the Wald sense. It is further shown that this procedure is adaptable to 
stochastic processes of a general type and this provides a means of connecting the theory 
of stochastic processes with the theory of statistical inference. Some examples are given to 
illustrate the general theory. 

17. Some Large Sample Tests for the Median. John E. Walsh, The Rand 
Corporation, Santa Monica, California. 

Consider a large number of independent observations from continuous populations with 
a common median. Sone non-parametric large sample tests for the population median are 
presented which are based on either two or three order statistics of the sample. If all the 
populations are symmetrical, these tests are equal-tailed with specified significance level a. 
If the observations are a sample from a normal population, these tests have high power 
efilciencies. Some tests based on three order statistics are developed which also have signifi- 



ABSTRACTS OF PAPERS 


469 


cance level a if all the populations are not symmetrical; however, in this case the resulting 
test is one>tailed instead of equal-tailed. Using these tests for situations where the popula¬ 
tions are believed to be symmetrical furnishes a safety factor with respect to Type I error. 
Tests are presented for the special case where each population is either symmetrical or 
skewed in a specified direction. If the populations are not symmetrical the significance 
level distribution is .4a to one tail and .6a to the other, rather than .5a to each tail. Also 
some non-parametric large sample tests of whether a sample is from a symmetrical popula¬ 
tion are derived. These tests are based on three order statistics of the sample and have 
bounded significance levels. 

18. Continuous Sampling Plans from the Risk Point of View. Zivia S. Wurtele, 
Stanford University, California. 

The quality of a lot can be improved by a screening process whereby the defective items 
found during inspection are replaced by non-defective items. The type of sampling plan 
adopted will generally depend upon the cost of inspecting items, the number of defective 
items in the lot prior to inspection, and the loss due to defective items remaining in the lot 
after inspection. The loss if the lot is accepted after d defectives are found in a sample of 
n items is equal to c(n) -f h{D) where D is the number of defectives left in the lot and c(n) 
is the cost of inspecting n items. An inspection procedure 8 is defined by a set of stopping 
points {(d, n)}. Let r(p, S) be the expected loss if p is the probability of a defective and 
the procedure S is used. It is assumed that the lot is obtained from a binomial population. 
For any a priori distribution F(p), a Bayes procedure is one which minimizes the expected 
risk, 

f r(p,S)dF(p). 

A systematic method of obtaining Bayes solutions exists, but the computations are formid¬ 
able. Under fairly general conditions the Bayes solutions are shown to be multiple sampling 
plans, in which the size of the ith sample depends upon the number of defectives in the 
(i — l)st sample. In particular, if the production is in a state of statistical control, a 
Bayes solution is a fixed sample size. It is also shown that for most reasonable loss func¬ 
tions, there exists no mini-max procedure which is uniformly better than the trivial one; 
namely, the Bayes procedure if p — 1. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Irving Burr has been promoted to a full professorship at Purdue University. 

Dr. D. A. S. Fraser, who received his Ph.D. degree at Princeton University in 
June, has accepted a position as Instructor of Mathematics at the University 
of Toronto. 

Dr. H. K. Hartline, formerly at the Johnsen Research Foundation of the Uni¬ 
versity of Pennsylvania, has accepted an appointment as chairman of the Thomas 
Jenkins Department of Biophysics, Johns Hopkins University. 

Dr. Leo Katz has been promoted to an associate professorship in the Mathe¬ 
matics Department of Michigan State College, East Lansing, Michigan. 

Professor D. D. Kosambi of Tata Institute for Fundamental Research, Bom¬ 
bay, India served as Visiting Professor at the University of Chicago for the 
Winter Quarter. 

Dr. H. G. Landau has resigned his position with the Ballistic Research Labora¬ 
tories and is now' a Research Associate with the Committee on Mathematical 
Biology, at the University of Chicago. 

Mr. Allen L. Maycrson, formerly an Associate in the Division of Statistics 
and Research of the Institute of Life Insurance at New York City, has accepted 
a position with the National Surety Corporation of New York. 

Mr. Raymond P. Peterson, who has been an Assistant in the Mathematics 
Department of the University of California at Los Angeles and also a graduate 
^udent there, has accepted a position with the Institute for Numerical Analysis 
at Los Angeles. 

Professor Edw in J. G. Pitman has returned to the Mathematics Department 
of the University of Tasmania after spending about a year and a half in the 
United States. From February to June of 1948 he w-as at Columbia University 
as visiting Professor of Mathematical Statistics. The rest of the time was spent 
at North Carolina and Princeton. 

A. Ananthapadmanabha Rau has returned to India after studying at the 
Statistical Laboratory in Ames, low’a. In addition to heading the Department 
of Statistics and Agriculture Meterology of the Government of the State of 
Mysore, India, he is w'orking on sampling design of experiments, and climatology 
and teaching statistics and climatology at the College of Agriculture. 

Dr. Andrew' Sobczyk of Watson Laboratories has been appointed to an 
assistant professorship at Boston University. 

Assistant Professor S. L. Thompson of Alabama Polytechnic Institute has 
been promoted to an associate professorship. 

William J. Youden is acting as Assistant Chief of the Statistical Engineering 
Section of the National Bureau of Standards and as special advisor to the Direc¬ 
tor on the problems of statistical and mathematical design of major experiments 
in physics, chemistry and engineering. 


470 



NEWS AND NOTICES 


471 


Two Doctorates in Mathematical Statistics were awarded at the University of 
North Carolina in June, 1949. The recipients were Uttam Chand, who has now 
been appointed Assistant Professor of Mathematics at Boston University, and 
Ralph A. Bradley, who will be Assistant Professor of Mathematics at McGill 
University. 

The Educational Testing Service, Princeton, N. J., announces the appointment 
of Elbert Lee Hoffman and William Edward Kline as ETS Psychometric Fellows 
for 1949-50 for graduate study in psychology at Princeton University. Mr. 
Hoffman is a graduate of the University of Oklahoma, and Mr. Kline has received 
both his bachelor’s and master’s degree from Yale University. Bert F. Green, Jr. 
and Warren S. Torgerson have received reappointments as ETS Psychometric 
Fellows. Each Fellow carries a full program of graduate study in psychology at 
Princeton University, including basic work in experimental and theoretical 
psychology. Special training is also given in mathematical statistics and modem 
quantitative methods as applied to psychological problems in such fields as 
learning, testing and attitude measurement, as well as in the techniques of 
developing aptitude and achievement tests. In addition to the graduate program 
in psychology, each Fellow spends part-time in training and research work with 
the Educational Testing Service. 

Preliminary Actuarial Examinations 
Prize Awards 

The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1949 Pre¬ 
liminary Actuarial Examinations are as follows: 

First Prize of ^200 

Moran, Joseph W.Yale University 

Additional Prizes of $100 

Farmer, Thurston P., Jr., . .... State Uiiivci*sity of Iowa 

llaakenstad, Dale L.University of Michigan 

Ilauke, William V.University of Michigan 

Lordan, Joseph D.Massachusetts Institute of Technology 

Mayberry, John P. .. University of Toronto 

Murch, Alan D. University of Toronto 

White, William A. Dartmouth College 

Zemach, Ariel. Harvard University 

The Society of Actuaries has authorized a similar set of nine prize awards 
for the 1950 Examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three 
examinations: 

Part 1. Language Aptitude Exammalion. 

(Reading comprehension, meaning of words and word relationships, aTitonyms, 
and verbal reasoning.) 









472 


NEWS AND NOTICES 


Part 2. General Mathematics Examination, 

(Algebra, trigonometry, coordinate geometry, differential and integral calculus.) 

Part 3. Special Mathematics Examination. 

(Finite differences, probability and statistics.) 

The 1950 Preliminary Actuarial Examinations will be administered by the 
Educational Testing Service at centers throughout the United States and 
Canada on May 19, 1950. The closing date for applications is March 15, 1950. 
Detailed information concerning the Examinations can be obtained from: 

The Society of Actuaries 
208 South LaSalle Street 
Chicago 4, Illinois 


New Members 

The following persons have been elected to membership in the Institute 
(March 1, 1949 to May 31, 1949) 

Alcantara de Oliveira, Eduardo, Ph.D., (Univ. de Sao Paulo) Professor, Faculdade de Filo- 
sofia, University of Sao Paulo, Rua Sergipe^ 96-Ap. Sao Paulo^ Brazil, 

Ashby, Wallace L., A.B. (George Washington Univ.) Agricultural Statistician, S7J^6 Jocelyn 
Streetf Washington 15^ D. C. 

Bailey, Edward W., B.Ch. (Ohio State Univ.) Quality Control Supervisor, Carbide and 
Carbon Chemicals Corporation, Y-12 Plant, 101 Moylan Lane, Oak Ridge, Tennessee, 

Berger, Agnes P., Ph.D. (Budapest) 10 Park Avenue, New York, New York. 

Brown, Walter C., B.S, (Colorado A&M College) Graduate Assistant, Department of Mathc- 
laatics, University of Oklahoma, 1130 Trout, Norman, Oklahoma. 

Cahin, Lyle D., B.S. (Univ. of Chicago) Research Graduate Assistant, Institute of Statistics, 
North Carolina Slate College, Raleigh, North Carolina. 

Carlyle, Charles G., B.S. (Univ. of Illinois) Graduate student at University of Illinois, 
C-S2 Stadium Terrace, Champaign, Illinois. 

Chen, Yu-nien, M.A. (Harvard) Graduate student, Harvard Universitj’, H7-60, Apt. D, 
Charter Road, Jamaica 2, New York. 

Clark, Fred J., Jr., B.S. (Colorado A&M College) Graduate Assistant at University of 
Illinois, Department of Mathematics, 61 A Court G, Stadium Terrace, Champaign, 
Illinois. 

Cohen, Samuel E., M.A. (Univ. of Pennsylvania) Statistician, U. S. Bureau of Labor Sta¬ 
tistics, J!^9 Galveston St., ^S.TF., Washington 20, D. C. 

Cole, Randal H., Ph.D. (Univ. of Wisconsin) Associate Professor, University of Western 
Ontario, London, Canada. 

Comrey, Andrew L., Ph.D. (Uniy. of Southern Calif.) Assistant Professor of Psychology, 
University of Illinois, Urbana, Illinois. 

Cook, Ellsworth B., B.S. (Springfield College) Head of Visual Screening Devices Research 
and Statistics Facility, U. S. Naval Medical Research Laboratory, Box 45, Submarine 
Base, New London, Connecticut. 

Cox, David R., Ph.D. (Leeds, England) Statistician, Wool Industries Research Association, 
2 Sunset Avenue, Leeds 6, Yorks, England. 

Denbow, Carl H., Ph.D. (Univ. of Chicago) Associate Professor of Mathematics, U. S. 
Naval Postgraduate School, Annapolis, Maryland. 

Dillon, Gregory M., A.B. (Ix)ng Island Univ.) Statistician, Pension Statistics Section, 



NEWS AND NOTICES 


473 


Treasury Department, E. I. DuPont de Nemours & Co., ISSl Cedar Street, Wilmington, 
Delaware. 

Duarte, Geraldo Garcia, Licenciado em Matematica (Faculdade de Filosofia de S. Bento) 
Assistente da Faculdate de Higiene e Saude Publica, Caixa Postal 99B, Sao Paulo, 
Brazil. 

Dudman, John A., B.A. (Reed College) Graduate student, Columbia University, 56 West 
70th St., New York ^S, New York. 

Edelson, Howard, B.A. (Ohio State Univ., Columbus, Ohio) Graduate student and Graduate 
Assistant, Ohio State University, 794 St., Columbus 6, Ohio. 

Feron, R., Licencie es Sciences, (Univ. of Paris) Attache de Recherce, IS rue des Feuillan- 
tines, Paris V, France. 

Franck, Edward Michel, Inpineur A.I.A., Professor of the Royal Military School, 104 Rue 
Pere Devroye, Woluwe St. Pierre, Belgium. 

Garritsen, Florence M., B.A. (Univ. of Michigan) Research Assistant, General Motors 
Corp., 5151 Lillihridge Ave., Detroit 13, Michigan. 

Gelsomini, Thea, Ph.D. (Univ. of Bocconi, Milano) Assistician of Statistics at Department 
of Statistics, University of Bocconi, Via A. Stoppi, N. 10, Milano, Italy. 

Goudswaazd, G, Ph.D. (Univ. of Leiden) Director, Permanent Office, International Sta¬ 
tistical Institute and Lecturer of Statistics, Rotterdam School of Economics and Free 
University of Amsterdam, 2 Oostduinlaan, The Hague, Netherlands. 

Gucker, Frank Fulton, A.B. (Harvard Univ.) Statistical Engineer, Remington Arms Co., 
Inc., 3175 Main Street, Bridgeport 6, Connecticut. 

Haberman, Sol, B.A. (Brooklyn College) Assistant Visiting Professor of Sociology', Univer¬ 
sity of Puerto Rico, 187 Avenida los Flamboyanes, Rio Piedras, Puerto Rico. 

Heimbach, Ernest E., M.B.A. (New York Univ.) Professor of Economics, Bergen College, 
Teaneck, New Jersey, 55 West 11th Street, New York 11, N. Y. 

Ishli, Shlgeru, B.A. (Univ. of Ill.) Student at University of Illinois, 320-1 Peabody Drive, 
Parade Ground Units, Champaign, Illinois. 

Jackson, James Edward, M.A. (Univ. of N.C.) Statistician, Color Control Dept., Eastman 
Kodak Company, 200 Pershing Drive, Rochester, New York. 

Jaspen, Nathan, Ph.D. (Pennsylvania State College) Research Assistant, Department of 
Psychology, Pennsylvania State College, State College, Pennsylvania. 

Jonhagen, Sven, Fil Lie. (Univ. of Stockholm) Chief Actuary and Assistant Teacher in 
Statistics at the University of Stockholm, Tegnergatan 36, Stockholm, Sweden. 

Kiefer, Jack C., M.S. (Mass. Inst, of Tech.) Student, Department of Mathematical Sta¬ 
tistics, Columbia University, 3826 Middleton Avenue, Cincinnati 20, Ohio. 

Kraft, Charles Hall, B.A. (Mich. State College) Instructor, Mathematics Department, 
Michigan State College, 707D Chestnut Road, East Lansing, Mich. 

Mayne, John W., M.Sc. (Brown Univ.) Graduate student in Mathematical Statistics, 330 
Furnald Hall, Columbia University, New York 27, New York. 

McCabe, William J., M.A. (George Washington Univ.) Chief Statistician, Transportation 
Corps, Department of the Army, 1725 South Oakland Street, Arlington, Virginia. 

Medin, Knut H., M.A. (Univ. of Uppsala) Assistant, Statistical Institute, University of 
Uppsala, Odinslund 2, Uppsala, Sweden. 

Mewbom, A. Boyd, Ph.D. (Calif. Inst, of Tech.) Associate Professor of Mathematics and 
Mechanics, P.O. Box 1748, Monterey, California. 

Minton, Paul D., M.S. (Southern Methodist Univ.) Graduate student, University of North 
Carolina, P.O. Box 634, Chapel Hill, North Carolina. 

Morris, Doris N., M.A. (Columbia Teachers College) Economics Assistant, Western Electric 
Co., 101 West 72nd St., New York 23, New York. 

Morris, Robert H., B.A. (Swarthmore) Development Engineer, Color Control Department, 
Eastman Kodak Co., Rochester, New York. 



474 


NEWS AND NOTICES 


Rajalakshman» D. V., M.Sc. (Madras Univ.) Head, Department of Statistics, University 
of Madras, Madras 5, S. India. 

Rudy, Norman, M.B.A. (Univ. of Chicago) Scientist, Ordnance Research Project, Univer¬ 
sity of Chicago and Instructor in Economics, Roosevelt College, 7106 S. Crandon 
Avenue, Chicago Illinois, 

Sakk, Kaarel, Fil. kand. (Univ. of Stockholm) Officer at Research Bureau of the State Food¬ 
stuffs Commission, Ostermalmsgatan 67 o.g. Ill, Stockholm, Sweden. 

Singh, Jagjit, B.A. (Punjab Univ.) Superintendent Transportation, E. I. Railway, Dinapore, 
c/o B.S. Bugga Eaqr., Post Office Box 441, Calcutta, India. 

Starr, Henry H., Ph.D. (Univ. of Vienna) Research Manager, Converted Rice, Inc., P.O. 
Box 1762, Houston 1, Texas. 

Sverdrup, Erllng, Actuarian (Univ. of Oslo) Lecturer in Mathematical Statistics, Institute 
of Mathematics, University of Oslo, Oslo, Norway. 

Talacko, Joseph Y., Ph.D. (Charles Univ., Prague) Assistant Professor of Mathematics, 
Marquette University, 260S So. 10th Street, Milwaukee 7, Wisconsin. 

Templeton, James G. C., A.M. (Princeton Univ.) Graduate student at Princeton Univer¬ 
sity, Fine Hall, Princeton University, Princeton, New Jersey. 

Vaughan, Elizabeth, B.S. (Univ. of Washington) Statistician, U.S. Fish and Wildlife Service, 
2725 Montlake Boulevard, Seattle 2, Washington. 

Wilkinson, Bryan, M.A. (Univ. of Nebraska) Personnel Research Specialist, Prudential 
Insurance Co., Western Home Office, 4I6O-B Muirfield Road, Los Angeles -#5, Cali¬ 
fornia. 

Yevick, Mariam A. L., Ph.D. (Mass. Inst, of Tech.) Staff, Division of Statistical Engineer¬ 
ing, National Broadcasting System, 9S1 Hudson St., Hoboken, New Jersey. 



REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 


The thirty-ninth meeting and fifth regional West Coast meeting of the In¬ 
stitute of Mathematical Statistics was held on the Berkeley campus of the 
University of California, from Thursday June 16 through Saturday June 18, 
1949. The session on June 17 was held jointly with the Biometrics Section of the 
American Statistical Association and the Biometric Society (Western N. A. 
Region). Sixty-six persons registered, including the following fifty members of 
the Institute: 

Jane F. Andrian, G. A. Baker, Z. W'ni. Biinbaum, Colin R. Blyth, Albert H. Bowker, 
Paul T. Bruyere, Chin Long Chiang, Edwin L. Crow, John II. Curti.ss, R. C. Davib, Carl 
H. Denbow, W. J. Dixon, Mary Elvebaek, Mark Eudey, Edward A. Fay, Evelyn Fix, 
William R. Gaffey, H. 11. Gernnond, M. A. Girehick, Jack Gysbers, Max Halperiii, J. L. 
Hodges, Jr., John M. Howell, Harry M. Hughes, Cuthbert Hurd, Terry A. Jeeve.^^i, Mark 
Kac, H. S. Konijn, George M. Kuznets, Erich L. Lehmann, Richard F. Link, Michel Loeve, 
Frank Massey, Lincoln E. Moses, Edith Mourier, Stanley W. Nash, J. Neyman, Edward 
Paulson, Stefan JVters, Raymond P. J’eterson, Robert 1. Piper, Ghuhs Rappapoit, Mina 
Rees, David Rubinstein, Elizabeth L. Scott, Esther Seiden, Charles M. Stein, John E. 
Walsh, John Wishart, Zivia S. Wuitele. 

Those attending were welcomed at the Thursday morning session by Edward 
W. Strong, Associate Dean of the College of Letters and Science, University of 
California. Professor Z. William Birnbaum of the University of Washington 
presided. 

The program was as follows: 

1. Recent advances in the theonj of the Wishart distribution, (Invited paper.) 
John Wishart, Cambridge University. 

2. Bayes, minimax, and other approaches to the multiple classification problem, 
(Invited paper.) M. A. Girshick, Stanford University. 

3. Some problems in sequential analysis, Charles AI. Stein, University of 
California, Berkeley. 

Professor Jerzy Neyman of the University of California, Berkeley, prcvsided 
at the Thursday afternoon session. Midway in the program there was an inter¬ 
mission for a tea given by the Statistical Laboratory, University of California. 
The program was as follows: 

1. Completeness in the sequential case, E. L. Lehmann and C. M. Stein, University of 
California, Berkeley. 

2. Some large sample tests for the median, John E. Walsh, The Rand Corporation. 

3. Continuous sampling plans from the risk point of vietv, Zivia S. Wurtele, Stanford 
University. 

4. Some problems in point estimation. J. L. Hodges, Jr. and E. L. I^hmann, University 
of California, Berkeley. 

5. Minimum variance in non-regular estimation. R. C. Davis, U. S. Naval Ordnance Test 
Station, Inyokern. 

6. Some aspects of links between prediction problems and problems of statistical estimoiion. 
Erling Sverdrup, University of Oslo. 

475 



476 


REPORT ON BERKELEY MEETING 


7. Extension of a theorem of Blackwell. (By title). Edward W. Barankin, University of 
California, Berkeley. 

8. Some two-sample tests. (By title). Douglas G. Chapman, University of California, 
Berkeley. 

9. On the existence of consistent tests. (By title). Agnes Berger, Columbia University. 

Professor F. W. Weymouth of Stanford University presided at the Friday 
morning session on biometrics. The program was as follows: 

1. Statistical problems arising from research in tuberculosis. Martha and Paul T. Bruyere, 
U. S. Public Health Service. 

2. Correlation of variability with growth rate in fish and mollusks. F. W. Weymouth, 
Stanford University. 

3. Some problems arising in plant selection and the use of analysis of variance. Stanley 
W. Nash, University of California, Berkeley. 

4. Studies of resistance of strawberry varieties and selections to verticillum wilt. R. E. Baker 
and G. A. Baker, University of California, Davis. 

5. A uniformity trial on unirrigated barley of ten years duration with implications for 
field trial designs. F. J. Veihmeyer, M. R. Iluberty, and G. A. Baker, University of 
California, Davis and Los Angeles. 

On Friday afternoon those attending the meeting were entertained at a picnic 
luncheon at Stanford University, given by the Department of Statistics, Stanford 
University. 

Professor C. B. Money, Jr., of the University of California, Berkeley, presided 
at the Saturday morning session. The program consisted of the following invited 
papers: 

1. Methods for getting limiting distributions. Mark Kac, Cornell University. 

2. Almost certain convergence. Michel Lo6ve, University of California, Berkeley. 

At 11 o'clock Saturday morning a business session was held, under the chair¬ 
manship of Professor Jerzy Neyman of the University of California, Berkeley, 
for the purpose of discussing future West Coast meetings. Plans for reviving the 
Statistical Research Memoirs were also discussed. 

On Saturday afternoon a final session for contributed papers was held under 
the chairmanship of Professor Albert H. Bowker of Stanford University. The 
program was as follows: 

1. Effect of linear truncation in a multinormal population. Z. William Birnbaum, Univer¬ 
sity of Washington. 

2. Estimation in truncated samples. Max Halperin, The Rand Corporation. 

3. On the similar regions of a class of distributions. Stefan Peters, University of California, 
Berkeley. 

4. Auxiliary random variables. Mark W. Eudey, California Municipal Statistics. 

5. The ratio of ranges. Richard F. Link, University of Oregon. 

6. Statistical problems in the theory of Geiger counters. Colin R. Blyth, University of 
California, Berkeley. 

7. Asymptotic properties of the Wald-Wolfowitz test of randomness. (By title). Gottfried 
E. Noether, Columbia University. 


J. L. Hodges, Jr. 
Assistant Secretary 



LOCALLY BEST UNBIASED ESTIMATES* 

By E. W. Basankin 
University of California, Berkeley 

Summary. The problem of unbiased estimation, restricted only by the postu¬ 
late of section 2, is considered here. For a chosen number s > 1, an unbiased esti¬ 
mate of a function g on the parameter space, is said to be best at the parameter 
point 6o if its sth absolute central moment at is finite and not greater than that 
for any other unbiased estimate. A necessary and sufficient condition is obtained 
for the existence of an unbiased estimate of g. When one exists, the best one is 
unique. A necessary and sufficient condition is given for the existence of only 
one unbiased estimate with finite sth absolute central moment. The sth absolute 
central moment at 0o of the best unbiased estimate (if it exists) is given explicitly 
in terms of only the function g and the probability densities. It is, to be more 
precise, specified as the l.u.b. of certain set fl of numbers. The best estimate is 
then constructed (as a limit of a sequence of functions) with the use of only the 
data (relating to g and the densities) associated with any particular sequence 
in a which converges to the l.u.b. of Q. 

The case s = « is considered apart. The case s = 2 is studied in greater 
detail. Previous results of several authors are discussed in the light of the present 
theory. Generalizations of some of these results are deduced. Some examples 
are given to illustrate the applications of the theory. 

1. Introduction. Let Q be a space of points x, and jli be a totally additive 
measure defined on a <r-field iFof subsets of R. Let '!P = {p(,^e0}bea family of 
probability densities in fl with respect to the measure /i. © is any index set; we 
lay down no conditions on its structure. We are concerned here with the existence 
and characterization of unbiased estimates of a real-valued function g on 0, 
which are in some suitable sense “best” for a prescribed parameter point 9o . 
That is, a real-valued, measurable (m) function fo on such that 

(1) f foPidu - gie), StQ, 

Jq 

and which satisfies a specified criterion of bestness for 0 — 6o. This criterion is 
usually taken to be 

(2) f (fo - g(0o)T Vh dp ^ f (f- 9(0o)yp,, dp, f « 2R, 

Jo Jo 

where denotes the class of all unbiased estimates of g; i.e., the class of all f 
satisfying (1). The obvious advantage in the definition (2) is the algebraic 

^ This article was prepared while the author was under contract with the Office of Naval 
Research. 


477 



478 


E. W. BARANKIN 


pliability. The obvious disadvantage is that iK may contain no estimate with 
finite variance (cf. section 9). 

For the investigation of the fundamental questions, posed above, relating to 
unbiased estimates, we shall not restrict ourselves to (2). We consider chosen 
and fixed, a number s > 1, and lay down the 

Definition, /o e SR is best at $q if 

^ > f \fo - g%) r Pio dfi ^ [ 1/ - giSo) r Pe, dny f 6 SR. 

Jii Jq 

With this, and under the condition of a rather natural postulate on ^ (cf. section 
2), we exhibit a necessary and sufficient condition for the existence of an unbiased 
estimate of g having a finite sth absolute central moment at 

Except for the discussion, in section 3, of the case in which g is constant on 0, 
we do not consider directly the estimation of g, but rather that of h = g — g(6o). 
Lemma 1, of section 2, gives the solution of the problem for g when that for h is 
known. After section 3, it is assumed exclusively that h is not =0, except where 
the contrary is explicitly stated. 

In case s is finite, the existence theorem section 4, Theorem 2, asserts also the 
uniqueness of the best unbiased estimate of A. It is interesting to observe the 
similarity between the proof of this uniqueness and Fisher’s proof of the (what 
might be called) asymptotic uniqueness of an efficient estimator [2 pp. 704, 705]. 
The case s = oo® is discussed in section 6; in this case we find that, in general, 
the best estimate is not unique. However, for s both finite and infinite, and as 
well when g is constant (.*. A = 0), we give a necessary and sufficient condition 
that there be a unique unbiased estimate with finite s.a.c.m.'* (cf. section 4, 
Corollary 2-1, and section 5, Theorem 3 (iii)). 

Theorem 2 determines the s.a.c.m. of the best estimate as the l.u.b. of a set of 
numbers given explicitly; and thereby, in particular, throws open the class of 
all lower bounds of the minimum s.a.c.m. Investigations after such lower bounds, 
in the classical case s = 2, have led to the well-known results of Cram^r-Rao 
[3 p. 480, (32.3.3)], and Bhattacharyya [4, p. 3, (1.10)]. In section 6, which is 
devoted to obtaining various special lower bounds, we show how those particular 
bounds fall out. It should be remarked, however, that our conditions on ^ are 
in general different from those of the above authors. 

^ For the case s » 2 an alternative existence condition, antedating these results, but not 
yet published, has been obtained by C. Stein. 

* If we use, in the above definition, the sth root of the sth absolute central moment, 
instead of the latter itself, then the bestness criterion for 5 = co is the limiting criterion 
for s —> 00 ; viz., 

00 > ess. sup. I fo - g(9o) | ^ ess. sup | / ~ ^(^o) | , / < 3R, 

««o 

where ess. sup. refers to the measure v{A) =» J p$Q dn, 

^ The abbreviation s.a.c.m. will henceforth be used to indicate sth absolute central 
moment at . 



UNBIASED ESTIMATES 


479 


In section 7 we give, in Theorem 7 and its corollary, a construction of the 
best estimate, depending only on the knowledge of the minimum s.a.c.m. The 
latter, as indicated in the preceding paragraph, is always known independently 
of any knowledge of the best estimate. We use these results to obtain explicitly 
(Theorems 8 and 9) the best estimates, for arbitrary 5, in two cases where 
we assume the minimum s.a.c.m. known. These cases, when « == 2, give the 
minimum variance as determined by the equality sign in the Cram6r-Rao and 
Bhattacharyya inequalities, respectively. 

Section 8 is given to a brief discussion of the special case s = 2. Finally, in 
section 9, we present a detailed study of an example. 

At the suggestion of the referee we have added an appendix in which is given a 
brief running description of the fundamental ideas of Banach spaces that come 
into use here. The italicized phrases are those mentioned explicitly in the course 
of the paper. 

We shall merely mention here certain points which will be elaborated further 
in future communications. (1) The general theory developed here pertains as 
well to sequential as to nonsequential estimation; one has only to make the 
proper identification of 12, /x, and Moreover, as applied to sequential 

estimation, the theory will determine the optimum stopping regions. (2) The 
discussion of section 5 below can be carried through with ^'ess. sup.^^ referring 
to the measure jx, and Si being the space of functions on 12 which are integrable 
(m); and for this, no restrictions whatsoever on the densities pe are required 
(cf. the postulate of section 2), since the pe are elements of this Si solely by 
virtue of their properties as probability densities. This development would, for 
example, be sufficient to yield the estimate of Girshick, Hosteller, and Savage [5] 
in the case of sequential binomial estimation. Also, this unrestricted analysis is 
fundamental for the problem of similar regions (a case of the bounded unbiased 
estimation of a constant function). (3) For any s > 1 it may be observed in the 
result of Theorem 7 below, that the best (at So) estimate depends only on a 
sufficient statistic; this is clear from Neyman's theorem on sufficient statistics 
[6], since the best estimate depends only on ratios of the density functions p $. 
But more than this, BlackwelPs method [7] of deriving a uniformly (over the 
parameter set) better unbiased estimate from a given unbiased estimate can be 
proved to remain valid also when the measure of dispersion is the sth absolute 
central moment, s > 1. And for this, the postulate of section 2 is not required. 
(4) Finally, we point out that, with the proper specializations of 0, Cramer’s 
theorem on the ellipsoid of concentration [8], Bhattacharyya’s multidimensional 
inequality [9], and the extensions of the Rao, Cramer, and Bhattacharyya 
bounds to sequential estimation—as, for example, by Blackwell and Girshick 
[1], Wolfowitz [10], and Seth [20]—can be drawn from Theorem 4 below. 

The inspiration for the mode of analysis in the following pages, and the 
major part of its substance, come from F. Riesz: his book [11 Ch. Ill] and the 
article [12] (in particular sections 8~11 thereof). In strictly mathematical ter¬ 
minology, Theorems 2 and 3 are given in [11] for the sequence-spaces fr ; and 



480 


ES. W. BARANKIN 


Theorem 2 in [12] for the spaces & of functions on the real interval [0, 1] with 
Lebesgue integrable rth powers. The proofs are given there for the case of a 
denumerable set 0; in [12] an indication is given of the extension to a non- 
denumerable 0. Our proof of Theorems 2 and 3, however, follows that given by 
Banach [13, p. 74] for the case of denumerable 0. It is based on two results, a 
theorem of Hahn-Banach [13, p. 55, Theorem 4], and the representation theorem 
(suitable for the general type of Sr that we consider) for bounded linear func¬ 
tionals on Sr [14, p. 338, Theorem 46]. The first of these, and the representation 
theorem for any r > 1, spring in fact from the same article [12, p. 475] of Riesz. 
In the case r = 1, the representation theorem is due originally to Steinhaus [15]; 
in the case r = 2, it was developed simultaneously in 1907 by Riesz [16] and 
Fr4chet [17]. 

Riesz’ proofs of the sufficiency of the condition in Theorem 2 proceed by 
constructing an explicit sequence of functions on fl which converge strongly in 
S. to the (in the present statistical terminology) best estimate. Precisely, if in 
Theorem 7 below, we take, for each n = 1, 2, • • • , the numbers , a? , • • • , a*, 
so that the expression 

t-i _ 

kn 

t-l r 

is maximum, then the assertion of this theorem is that of Riesz. However, 
Theorem 7 is established here without this strict requirement on the a? . The 
dropping of this restriction was essential for the proofs of Theorems 8 and 9. 
The latter two theorems are, in fact, proved with the use of Corollary 7-1, 
which is an even stronger result than Theorem 7. This corollary falls out of the 
proof of Theorem 7 immediately, in consequence of our use of Lemma 2 for that 
proof. The lemma, moreover, eases the proof of Theorem 7 markedly, in doing 
away with the need for any differentiation. 


2. Preliminary considerations. We begin then by introducing the absolutely 
continuous (with respect to m) measure, defined on 7, 

v(A) = Psq dfi, A € y. 

A function 0 is summable (v) over Q if and only if <f> • is summable (m) over 
and we have 


/. 


^ dv 



(cf. [18, pp. 36-38]). Assuming that each of the ratios 


»#(«) 


P«(») 
P»o(®) ’ 


et& 



UNBIASED ESTIMATES 


481 


is defined almost everywhere (n) throughout Q, it follows that / is an unbiased 
estimate of ^ if and only if 

(3) f frs dv = g(d), 0 tQ. 

Ja 

We define 

h(0) = g(e) - giOo). 

Since 

I Tf dp = 1, 6 €&, 

Jq 

it is clear from (3) that / is an unbiased estimate of g if and only if / — is 
an unbiased estimate of h. Moreover, / is best, for g, at Bo if and only if / — ^(^o) 
is best, for h, at . 

Define 


s 


and let 8 r and S, be the spaces, normed in the usual way, of real-valued functions 
on Q, with summable (v) absolute rth and sth powers, respectively. We denote 
the respective norms by 1 | ||r and |1 ||, ; that is, if $ € Sr and 77 e S,, 

and 

11,11. 

We note that these spaces, for s < 00 , are weakly compact (cf. [ 21 ]). This 
property will be used in the proof of Theorem 7. Also, we shall make explicit use 
of the representation theorem for linear functionals on [14, p. 338, Theorem 46]. 
The assumptions on or on ^0 = (tt^ , ^ €0}, will now be the following. 
Postulate: The functions we are defined almost everywhere (m) in Q, and are 
elements of 2r . 

The foregoing considerations combine to give the following equivalence. 

Lemma 1. ^ + g(9o) is an unbiased estimate of g, which is best at Bo , if and 
only if (i) ^ satisfies the equations 

(4) f <t>*r$dv = hiB)^ B^Q, 

Jq 

and (ii) when ^ is any other function satisfying (4), we have 

Ik II. ^ II ^11.; 



482 


E. W. BABANKIN 


that is, if and only if ^ is an unbiased estimate of h with minimum {finite) norm in 
8 ,. The s.a.c.m. o/0o + g{6o) is precisely || ^ ||t. 

Starting with section 4, we shall deal directly with the estimation of h. 

3. The case of constant g. Throughout the remainder (section 4 et seq.) of 
this article, the function h is assumed, unless the contrary is explicitly stated, 
to be non-constant; that is, since h(6o) = 0, not = 0. We can, and shall in this 
section, obtain the results of the desired kind for the case of a constant function g, 
by a brief, direct attack. 

Let g{6) = ?o, a constant. Then of course h{0) = 0. One unbiased estimate of g 
is immediately obvious, viz., fi{x) s . The s.a.c.m. of fi is 0. 

There will e.\ist other® unbiased estimates of g with finite s.a.c.m. if and 
only if there exist non-null unbiased estimates, in 8 ,, of 0 s h. That is, by virtue 
of the isomorphism between 8 , and the space of linear functionals on 8 r, there 
will exist an unbiased estimate of g with finite s.a.c.m., distinct from /i, if and 
only if there exists a non-null functional on 8 r which vanishes pn the elements of 
^0 = {«■», S «0}. And a necessary and sufficient condition that such a functional 
exist is that iPo be not a fundamental set in 8 r [13, p. 58, Theorem 7]. 

Observe finally that, in any case, fi is the unique unbiased estimate of g with 
vanishing s.a.c.m. 

We collect these results in the following statement. 

Theorem 1. If g{0) = go, a constant, then there is a unique best unbiased estimate 
of g-, viz.,fi{x) s jTo. And the s.a.c.m. of fi is 0 . 

A necessary and sufficient condition that there exist no other unbiased estimates 
of g having finite s.a.c.m. is that the set iPo be fundamental in 8 ,. 

As an illustration of the ideas of this section, consider the following example: 
fi is the real interval [0, 1]; /x is Lebesgue measure; © is the set of non-negative 
integers; and 

peix) = (0 + 1)/. 

And take do = 0. Then, i> is again Lebesgue measure, and ve = p* for each 9. 
For definiteness, take r = 2 (the results in this case are the same for any r ^ 1 ). 
It is well-kno^vn that the non-negative integer powers of x form a fundamental 
set in 82 on a finite real interval. That is, if { is a function on [0, 1], such that 

/ f* dx < 00 , and if « > 0 , then there exist an integer n and coefficients 60 , 
Jo 


* That is, distinct from/i in the sense of 8t ; or, equivalently, differing from fi on a set 
of positive (v) measure. Whenever, in the sequel, an equation & & appears, for two 

functions and (t in 2, or 8, , equality almost everywhere (v) in it will be understood. 
It is a consequence of our postulate that if two functions on P are equal almost everywhere 
(>>), they are equal almost everywhere (i*'), where r' is anyone of the measures y'(A) “ 



UNBIASED ESTIMATES 


483 


bi, ••• ,hn such that 

{({- dx<e. 

Hence, in this case an unbiased estimate with finite variance at ^ = 0 is unique 
(as well for a non-constant function g as for one which is constant over 0 ; cf. 
section 4, Corollary 2 - 1 ). 

4. The main theorem for non-constant h. We shall denote by 9DZ, the class 
(or the set in ?«) of all unbiased estimates of h that belong to . 

Theorem 2 . (i) A necessary and sufficient condition that 9)?, be non-empty is 
that there exist a constant C such that for every set of n functions ire^ , ire^, • • • , , 

in , and every set of n real numbers ai, a 2 , • • • y an , we havey for every n — 1 , 2 , 


(5) 


2 


1-1 


S C 


S ai 




I »• 


(ii) For every <t> 1 we have || 0 ||« ^ Co, where Co is the g.l.b, of the set of 
admissible constants C in (5). 

(iii) If is non-empty there is a unique 0o €9)?, with |! 0o li« = Co. ThuSy 
00 is the unique unbiased estimate of h which is best at do . 

The non-constancy of h clearly implies Co > 0. 

The necessity of condition (o) is immediate. Suppose 0 6 9)?,, so that 0 satisfies 
equations (4); then, for any , ^ 2 , • * • , , and any real numbers Oi, 02 , • • • , Un, 


n « n 

'^Oihidi) = / OiiTti-dv- 

1-1 Ja •-! 


By the Holder inequality it follows that 


51 Othie,) 


t-I 


^ Ik Ik 


n I 

a.TTe^ ' . 

1-1 ; r 


Hence (5) is satisfied with C = H 0 li*. 

Part (ii) of the theorem is hereby proved as well. 

Suppose 9)?, non-empty, and 0o , 0i in 9??, , such that |i 0o ||« = ll 0i ||i = Co . 
Then 1/2 (0o + 0i) € Wh and therefore 

1/2 II 00 4" 01 ii« ^ Co. 


But, by the Minkowski inequality, 

1/2 II 00 + 01 II* ^ 1/2 (II 00 II* + 11 01 II*) = Co , 

Hence 

11 00 + 01 11* = 11 00 II* + 11 01 11 * . 


This equality implies 0i = a 0o for some positive a. But since the norms of 0o 
and 01 are equal (and ^0) a must be unity. Thus the uniqueness of 0 o isproved. 



484 


E. W. BARANKIN 


It remains now to prove, assuming (5) satisfied, the existence of ^. Consider 
the functional F on iPo defined by 

F(irt) - h{$). 

The Hahn-Banach theorem alluded to in section 1 (viz., [13, p. 55, Theorem 4]) 
has precisely (5) as a necessary and sufficient condition for the existence of a 
linear functional G on 8r satisfying 


(a) GM = A(0), <?€©; 

(b) II (? II ^ C; 


where || G 1| is the norm of G, i.e., 


||G|| = Lujb. 


I (?({)! 

UWr • 


In particular, taking C = Co , there is a linear functional Go on ?r with 


(aO Go(ir,) = h(e),ee& 
(bO II Go II g Co. 


n 

But, for an element ^ in the linear manifold [^o] spanned by the tt®, 

t-i 

Go (di ^ ^ di h{B{^y 

% i 


SO that 


11 Go II ^ iMh. = Co. 

{«[»ol II { Ilf 

Hence (b') is replaced by the precise statement 

(b'o II Go II = Co. 

Now the representation theorem for linear functionals on S, asserts the exist¬ 
ence of ^ e , such that 

Go(() = f dv, 

Ja 


and 


II ||» = II Co II = Co. 


This taken with (a') establishes the existence of 0o« S. satisfying 

f Bo^tdv = h(0), 

1 Jo 

. II ^ ll» “ Co. 

and this completes the proof of the theorem. 


0C0 



UNBIASED ESTIMATBS 


485 


It is readily seen that 9)?. will consist of more than just ^ if and only if there 
exists a non-null functional on :^r which vanishes on • Our discussion in 
section 3 therefore enables us to assert the following. Ill 

Corollary 2-1. SW,, when it is non-empty, consists of alone if and only if 
is fundamental in Sr • 

A word is in order concerning the following two consequences of the bounded¬ 
ness of the measure y: (i) if 9Jo C S,, then also 'iPo C for every r' < r; (ii) if 
0 e 8, then also <t> 1 8.< for every s' < s. Otherwise stated: (i') if satisfies the 
postulate of section 2 for the number r, it likewise satisfies this postulate for 
every (admissible) r' < r; (ii') if SB, is non-empty, then SW,- is non-empty for 
every s' < s. Regarding (i') we shall make only the obvious remark that although 
satisfies the postulate for every r' < r, there may be values of r' < r such 
that no C for (5) exists; this will be exemplified in section 9. Where (ii') is con¬ 
cerned, it is clear that the non-emptiness of SB, will not necessarily imply that 
^0 CZ for every s' < s, even though for every such s' 9B,' is non-empty. 

If for every ^ e 0 other than we have 4 S.v.'-i, for some particular s' < s, 
then we may have the situation in which there are elements in 9B,' with norms 
arbitrarily close to 0. However, this cannot be the case if (a) for some 6 other 
than Oo, irte S.'/.'-i, and (b) h does not vanish identically on ©', the set of those 
0 for which ir« e . For, when these two conditions are satisfied. Theorem 

2 applies to h as defined on 0'; consequently there is a positive lower bound 
for the s' —norms of the unbiased estimates of h over 0'. And since every ele¬ 
ment of SB,' is, in particular, an unbiased estimate of h over 0', it follows that 
the norms of those elements are bounded below by a positive number. 

6. The case s => «> (r = 1). Let 9B« denote the class of essentially bounded 
(i>) unbiased estimates of h; and let bestness at 9o be defined with respect to the 
essential absolute suprema of the elements of this class. That is, the unbiased 
estimate 0o, of B, is best at 0o if 

ess. sup. l^(a:) 1 < <», 

and if, when <f> is another unbiased estimate of h, we have 
ess. sup. I ^(x) I ^ ess. sup. | <^(x) |. 

ZiQ XtO 

The fundamental postulate for the functions is, in this case, that S3o Cl 8i. 

Now, , the space of essentially bounded, measurable (v) functions on Q, 
normed by ess. sup., is the space of linear functionals on 8i [14, p. 338]. Examina¬ 
tion of the proof of Theorem 2 will show that that proof goes through also in the 
present case in all but one detail: we cannot here in general prove the uniqueness 
of the best estimate. The proof of uniqueness breaks down since the equality 

ess. sup. 1 ^(x) + <h(*) 1 = ess. sup. \ 0o(x) | + ess. sup. | ^(x) | 



486 


E. W. BARANKIN 


does not imply that 0i is a constant multiple of 0o. Of course, if is fundamental 

in Si , we have a fortiori the uniqueness of the best estimate. 

The results for the case 5 = oo are then the following. 

Theorem 3. (i) A necessary and sufficient condition that aWoo be non-empty is 
that there exist a constant C such that for every set of n functions , • • • , tt®. , 

in , nnd every set of n real numbers Ui, 02 , •••, an , we havCy for every n = 1, 

2 , 


n 


Othidi) 



|1 


(ii) For every 4> 6 3Jl<a we have 1| ||oo ^ Co, where Co is the g,Lb. of the set of 
admissible constants C above. 

(iii) When 9Woo is non-emptyy it contains elements with norm equal to Co. These 
are the best (at tfo) unbiased estimates of h. When ‘iPo is not fundamental in Si, 
there need not exist a unique best estimate. 

We close this section with the remark that Theorem 1 remains valid, as it 
stands, in the case s = 00. 

6. Particular lower bounds for the minimum s.a.c.m. In order to stress their 
significance in the statistical context, we shall give the statements of this section 
with the help of the symbol for the sth root of the s.a.c.m. of the unbiased 
estimate 0, of h. We have of course, the relation 

= lU II. • 


Now, one of the most important aspects of Theorem 2 is that it presents us 
immediately with an explicit evaluation of the minimum o-,(0) for all <#> e SDi,. 
We state the formula in the form of a theorem. 

Theorem 4. Let 91 denote the set of all real numbers. Then, 


g.l.b. <r,(<^) 
4 ,,an 


l.u.b. 


n 

E 

aMed 



n 


E 

Oiir*. 


For brevity, let us set 

g.l.b. <r.(4>) = 

Since this theorem expresses <rT'“ as the l.u.b. of an explicit set of numbers, 
it is clear that the class of all lower bounds of cT" is thereby thrown open to us. 
It follows that, when s = r = 2 and our hypotheses on iP are fulfilled, the classical 
lower boimds of Cram4r-Rao [3, p. 480] and Bhattachaiyya [4, p. 3] are par¬ 
ticularized consequences of Theorem 4. In the results that follow here we shall 
indicate the deduction of those classical bounds. We need not, however, restrict«. 

For a moment, let us denote by ir(x) the function on 0 which assigns the 
value Tp(x) to the point p e 0, and let 0 be an interval on the real axis. Then we 
shall, below, write vi for the function (when it exists) on Q which assigns the 



UNBIASED ESTIMATES 


487 


value (dir(x)/dp)i^ to x til. Similarly, ir« for the function aaaig ning the value 
(dPv(x)/dp^),^ to x; and so on. 

Theorem 5. Suppose the following conditions fulfilled: 

(i) 0 = B, an interval on the real axis; 

(ii) h is differentiable on 0' £ 

(iii) for each 0 €0', re is defined almost everywhere (p), and is an dement of 8,; 

(iv) for each $ e 0', 


lim 

p-*$ 


i\ 

|l Tp — Te 

W p — 6 


/ 

— TTe 


I 

I' 


= 0 . 


Then, for any m + n (ni,n = 1,2, • • •) points di, 62 , ••• y Omin d, and $[, 62 , 
• • • , 6 'n in 0', and any m + n real members ai , a2, • • • , am, 61, , • * • , bn such that 


we have 

( 6 ) 



+ 2 b»7r^;- 


t-i 


^0, 


min 


£ aMO,) + £ bih'ie-) 

t-1 t-l 


X aiTBf + Zb, re'i i 

»-i »-i I 


The prime on the h in (6) denotes the derivative of h. 


To prove this theorem, observe first that by virtue of Theorem 4, we may ^vrite 

. h{pi) — hlfi'i) ; 


min ^ 

a, ^ 


23 ciih(di) + 23 bi 


1-1 


i-I 


Pi — «'»* i 


I! 


II 23 0,-^^ + 23 b,- ^ - ^r- 

(1 1—1 1—1 Pi ““ “i 


for every set of points pi,pt, • ■ • , p„ in ^ such that the denominator of the right- 
hand side is defined and ^ 0. Therefore, also 


(7) 


pi~*ei 
1—1,2,* • ‘.n 


i-i Pi — di 


1-1 


i-l pi Oi 


i-1 


Now, by condition (iv), the element 


2 a*‘^»i + ^ bi — -^ 

i»l pi — Ui 


iml 


of ?r converges, in the strong sense in ?r, to 

m n 

23 aiTT^i + 23 biT^;, 

i-l i-l 



488 


E. W. BARANKIN 


96 p( —* 0{, i = 1, 2, • • • , n. Consequently we have convergence of the norm; 
that is, the denominator of the right-hand side of (7) converges to the denomi¬ 
nator of (6). (The latter is s^O, so that for all p< sufficiently close to 9i,i » 
1, 2, • • • , n, the ratios in (7) are defined.) There is no difficulty about the 
convergence of the numerator of (7) to that of (6). The theorem is thus proved. 

CoBOLLART 5-1. Under the hypothesis of Theorem 5, we have, in partieular, 
when 6o «0' and || ||, 0, 


( 8 ) 


min 

(Tb 




IW)I 

IkJjfr* 


If we denote by p the function on Q X @ which assigns the value p${x) to the 
point (x, 9), and write (8) in the form 


(80 


\jm\ 


(»" )’ a f Isiogp 


/ 

Jq 


dd 


!•—#0 


Pbo dll 


the generalization of the Cram6r-Rao inequality afforded»by (8) becomes 
evident. 

Using the result and method of Theorem 5, we can establish the next in a 
hierarchy of theorems. 

Theorem 6. Suppose the hypothesis of Theorem 5 saiisfiedy and the following 
condition fidfiUed: for each d in a non-empty subset 0" of 0', (i) {the second 
derivcUive) exists and (ii) Te is defined almost everywhere {v)y is an element of 8r, 
and satisfies 


lim 



0 . 


Then, for any m + n + qim, n, g = 1, 2, • * •) points ft, ^2 , • * • ,Sm in 
0 i j $2 j •••, On in 0', and 61 , 62 ,••• y Oq in 0", and any m + n + g real numbers 

> * * * > O/ns y hi y ^2 > * * * ) y ei y C 2 y y Cq SUCh that 

m n q 


1-1 


«-i 


«•—1 


9^0. 


we have 


min 


Z oihied + i: bMe^) +1, 

1-1 »-l »-l 


z *■»< "I" z *"•< “I" i 

i~l <-l 


»-l 




Just as in the case of the previous theorem, we have here an immediate corollary. 
Corollary 6-1. Under the hypothesis of Theorem 6, we have in particular, when 

mta ^ |W(g.) + cft^WI 
II + CV*. Hr * 


(9) 


V, 



XmBUSSiD BSTIMATES 


489 


for any two real numbers, b and c, such that the denominator of the right-hand side 
does not vanish. 

Consider (9) in the particular case s = r = 2. In this case, (9) may be written, 
explicitly, 

(10) (.. ) i [ 1 / ap w ■ 

In particular, (10) holds for values of b and c which maximize the right-hand 
side. And that maximum value is found, in the usual way, to be 
J'\h'(eo)f + 2J%'(eo)h"(eo) + J^[h"(eo)]\ 

where the matrix 


is the inverse of the matrix 


trix 

JapAde) ^ Japede dd* 
fldpd ^ p , r 1 /5 *pVw 

lap,dd dfi ^ JopAdfl*/ , 


Ifidpdp, r 1 kpYw 
[ Jo pe dd dfi Ja pAdff^) 

Thus, we have 

(11) (s?y ^ J'Hh'Wf + 2J%'(9o)h’'(0o) + J“[h"(Oo)]’. 


This is seen to be Bhattacharyya’s result for the case of derivatives up to second 
order. 

It is obvious how we extend Theorem 6 to obtain a similar result involving the 
functions *•»,*•#,*•», • • • , t*"’, for any assigned n. And it is thereafter clear 
how, in the case s = r = 2, Bhattacharyya’s general inequality may be deduced. 

It is clear that we can proceed from Theorem 4, under suitable conditions, 
to lower bounds for <r“‘“ which involve integrals of the functions ir(x) (and the 
corresponding integrals of h) as well as the derivatives of these functions. 

In closing this section we note that all the above considerations apply equally 
to the case s = ». 

7. Determination of the best estimate. We shall now prove the following 
theorem, which provides an explicit construction of the best (at 6 o) estimate of h. 
We repeat that« is now taken to be finite. 

Theorem 7. Let 2W, be non-empty, and 4>obe1he best {at $o) unbiased estimate of h. 
Let {0i, t = 1,2, ’ • • ,kn\,n = 1,2, • • • , 6e o sequence of (finite) sets of points of 
0, and {«" , t =>> 1, 2, • • • , Ain}, « = 1, 2, • • • , o sequence of sets of real numbers, 
such that 


lim 


i:«7h(e7) 

i-1 _ 

^ « - 


Co = 11^ ||< 


min 



490 


E. W. BARANKIN 


Then the functions f„ : 


fn(x) = 




t-1 




z2 oti ir»J (x) 

i-1 


r/« 

sgn 



A 

cti rti 



(are elements of ?, and) converge strongly in S, to ^ . 
The strong convergence here means precisely that 


lim f I fn — ^ r d»' = 0. 

n-*«o JQ 


Clearly, we may, with no loss in generality, assume the numbers a" to be 
such that 


( 12 ) 


Z 


= I, n = 1, 2, 


We shall suppose this to be the case throughout the proof. Then the essential prop¬ 
erty of the and the a" is that 


(13) 


lim 

n-*oo 



= Co 


And in this normalized situation, the functions fn will be given by 



That these functions are elements of ?, is easily seen; in fact, 


I 1 MI.= 



The proof of this theorem will consist mainly in the application of the following 
two lemmas. 

Lemma 2 . LetO 9 ^ y e^,, and {fn , w = 1,2, • • •) be a sequence of functions in 
S, such that 

(i) II fn 11, = 1, n = l,2, ••• 

(ii) lim / = |U II.- 

n-»ao JQ 

Then converges strongly in ?, to the function 

lo = 11 I n r^’’ sgn 1J. 



ydv = II 


Let us observe first that 
(15) 



XmBIASED ESTIMATES 


491 


and 

II & II = 1. 

Furthermore, & is the unique element with norm ^ 1 in 8r having the property 
(15). For, if also, 

j ^ovdv = Ih II., II fo Hr ^ 1, 

JQ 

we then have 

[ K& + ^o)-vdv = II >7 II.; 

and from this, 

ilUo + J^lIrlhll.^ Ihll.- 

That is, 

II & + Hr ^ 2 ^ H fo Hr + II fo Hr • 

From this, and (Minkowski) 

II lo + fo Hr ^ H ^0 Hr + II fo Hr , 

we have 

IUo + ^:Hr=IUoHr+IUi||r. 

Therefore, for some a > 0, fo = a$o • But we must have a = 1 if and are 
both to satisfy (15), as assumed. Hence fo = $o. 

Now consider the sequence {{„}. Choose a sub-sequence {f„,) that converges 
weakly to, say, Then || ^M|r ^ We have 

f ^'rjdv = lim f ^mvdv = H »7 H> • 

JQ »-*oo JQ 

Hence, $' = &. And since 1 = IU»i Hr 1 = lUo Hr, it follows that converges 
strongly to & (cf. [13, p. 139, section 3J). 

Suppose there is a subsequence {€„J of {f„} such that 

H fn* - fo H > « > 0, Z = 1, 2, • • • . 

We have, nonetheless, for this subsequence, the hypotheses of our lemma 
satisfied. We can therefore apply the argument of the previous paragraph to 
extract a subsequence of {{»<}, which converges strongly to Jo. This is in obvious 
contradiction to the above 5-assumption, and the lemma is hereby proved. 

Lemma 3. Lemma 2 remains true with the roles of and ?, interchanged. 

This is obvious. 

Returning now to the proof of Theorem 7, let us first, for the sake of brevity. 



492 


B. Vr. BAIUNKIN 


introduce the notation: 


I-*! 

7« = sgn^2a? 


v*> 

= Z «" «■»*. 

t-1 


From 


we easily obtain 


which we may write 


/ <tKT,dv = h(6), e € 0. 

Jfl 

= n = l,2, 


j^<f>o-yn4'ndv = \Cn\, n = 1, 2, • • • . 

Since I c„ I ^ II00II. (cf. (13)) and || ||. = 1, n = 1, 2, 

nave, by Lemma 2, that 7n^n converges strongly to 

1^0 = I 00 T’’ Sgn 00 . 
fn = C„ I 0„ I''* 8gn 

Jjn-yn^nd, = | C„ | , 

And from this we conclude that 

Urn / iTni^odv = Co, 

n-*«o Jn 

or 


( 16 ) 

The functions (cf. (14)) 
obviously satisfy 


, (cf. (12)), we 


n = 1, 2, •.. 


We may apply Lemma 3 to this result, since || f„/| c» 11|, = 1, n = 1, 2, 
And we thereby conclude that f,/| c„ | converges strongly to 

100 r* sgn 00, 



UNBIASED ESTIMATES 


493 


which, on substituting from the definition (16) of , we find to be just 


Co' 


Since | c« | —» Co, it follows immediately that converges strongly to ^o, and 
the theorem is proved. 

The following corollary is actually of greater use in applications than Theorem 
7 itself, for the reason that it leaves no doubt about the form of lim f„ (i.e., <t>o) 
when we know explicitly the form of lim yn^n • 

Corollary 7-1. Assume the hypothesis of Theorem 7. Then the functions 


sgn(g 



r a* 


converge strongly^ in , to a function , and 

00 = Co I 00 sgn 00 • 

This is clear from the proof of the theorem. 

By way of illustrating the application of these results, we shall prove the 
following theorem. 

Theorem 8. Assume the hypothesis of Theorem 5. And^ further^ let the equality 
sign hold in (8), Then, 

M i T;,(x)|''*SgnTax). 

II ||r 

Since (8) is an equality, we may under the hypothesis of Theorem 5, consider 
that we have 


(17) 



Pn vO Pn 


hide) 




where {p„} is a sequence in A converging to . The numerator of the right-hand 
side of (17), sans the vertical bars, converges to h'(0o) (which is ?^0, since 
Co 9^ 0); hence, for all sufficiently large n, that expression has the signum of 
h'(9o). The functions whose norms appear in the denominator of (17) we know 
to converge strongly in ?, to (by the hypothesis of Theorem 5). Hence, for 
this case, the function of Corollary 7-1 is 

, sgn h'(eo) , 

II’r;j|r 


Therefore, by the same corollary, 


4 >oix) 



sgn h'(0o) 


*■*0 


ir/. 



494 


S. W. BABANKIN 


•sgn 

^ I f / \ ir/t f / \ 

= ]\> nr I Vtt W \ Sgn ir,^{x). 

II ‘^>0 ll<- 


And this is the result asserted in the theorem. 

The reader will have no difficulty in establishing, in the exact pattern of the 
preceding proof, the following. 

Theorem 9. Assume the hypothesis of Theorem, 6. And, further, let the equality 
sign hold in (9) /or 6 = 6o, c = Co • * Then, 


-^(x) = 


boh'(d(,) ~h Coh''((?o) 

II + CoVe\ ||J 


6oirij(x)+ Co ve[(x) |’‘'** sgn (&o7ri,(a:) + Cbx/' (x)). 


It is evident that results of the type in these theorems may be built up as 
well with integrals over the parameter space. 

A question of considerable practical importance is that of the rapidity of 
convergence of the f„ to <^). An answer to this question, on the level of generality 
we are maintaining in this study, consists in relating this convergence to that 
of the I c„ I to Co. In the case s = r = 2, the answer is immediate and exact: 

II fn — ^ Ho — f (Cn ~ ^o)' dv 
Ja 

= [ dv — 2 f <t>(i^n dp + f <t>0 dp 
Ja Ja Ja 

= I c„ r - 2 i c„ r + C5 
= c2 - I c„ p. 


Thus, if one unbiased estimate is known, it provides, since its norm is ^Co, 
an upper bound for || f„ — 0o Ho. The same is true in the general case (any s) 
once we have established an upper bound, depending on Co and | c„ |, for 
II fn — ^ ||«. But in the general case, a good upper bound does not seem to be 
so close at hand. There are indications of the direction in which one must proceed, 
and we hope to draw some significant results out of these before long. 


8. The case s = r = 2. The particular aspects of this case (where bestness 
of an estimate has reference to its variance), which arise out of the coincidence of 

?r and S,, merit some discussion. We shall denote the inner product, / f»> dv, of 

Ja 

two functions ( and ij in , as usual by ((, y). Let {iPo} denote the closed linear 
manifold in spanned by the x*. 

Theorem 10. Zjet Sflt be non-empty. Then tfs, is the unique element of fD?j which 
lies in {$o|. 

* In the case s » 2, 5o and co are the values which render (11) an equality. 



UNBIASED ESTIMATES 


495 


To begin with it is clear that the functions of Theorem 7, in the present case 
8 = r = 2, are all elements of [^o]i the linear manifold spanned by the ir «. 
Hence, since ^ is the strong limit of these elements, « {iPoj. 

Now suppose also « 3)?2,« {iPo}. Then, from 



(00 , re) - hie), 

e e 0 . 


(01 , re) = hie), 

etQ, 

we have 

(01 — 00 , re) = 0 , 

ee@ 


and, by continuity of the inner product, 

(</>i — 4>o) ?) = 0, f < {*iPol; 

that is, — 00« But, from 0o« {'ipo} and <t>i « {'i|3ol it follows that 

01 — 00« {^o}. Hence 0i — 0o = 0, and this proves the exclusiveness of the 
property for 0o. 

Another characterization of 0o is given by the following corollary. 

Corollary 10-1. // 9)?2 is nm-empty, then 0o is the unique element of 2)?2 which 
satisfies the system of equations in ^ : {<j>, = || $ II 2 ,0 « 2)?2 • 

To see that 0o has the asserted property, let 0 be any element of 9()'?2, and set 
0 = { -t- »?, with ^ « CiPo} and v « {^o}"*^. From 

({, vi) = (I + i/i = ( 0 , re) = h{$), 
it follows that ^ e 2)?2. Hence ^ = 0o. And so, 

( 0 , 0 o) = (00 + V, 0 o) = II 00 II2 . 

If «6i € !I [)?2 has this property also, then both 

( 01 ,0o) = II 00 II 2 

and 

(00,0i) = II01 Ho; 

and therefore 

II 01 II 2 = II 00 II 2 • 

This proves 0i = 0o, and so the corollary. 

9. An example. Let fl be Euclidean n-space, a; = (.ti , X 2 , • • •, Xn); y, Lebesgue 
measure; 0, the set of real numbers; and 

“(a^> “’’{4 S'*''*>’}■ 

And finally, let 6t ■■ 0. Then 

r,{x) = exp . 



496 


E. W. BARANKIN 


If 0 < 6 < i, and we define 


we have, for each 6, 

0i(x)p»(*) dn - exp - 1- 

Thus, ^ is an unbiased estimate of the function h: 

h(e) - exp 0*1 - 1 . 

If we examine 

II ^ II* = /„ I (1 - exp i: X?} - 11 exp |-i g x?} 

we find that this integral converges only for s < 1/26. Shifting the emphasis, 
we may state: for the function h, defined by 

h(e) = e“** - 1 , a > 0, 

there exists an unbiased estimate with finite sth moment at 0 — 0, for each 

n -f" 2a 

Next, observe that 

II *■» III- ~ (2;^* L {“I S 

= exp nr(r — 1)0*), 

so that the are elements of for each r > 1. The ratio 

L^l= (<?“”-l)exp{-}n(r-l)0*} 

II IT* l|r 

is seen to diverge as 0 —» <», if 

J n(r — 1) < a. 

Hence, by Theorem 2, there exists no unbiased estimate of h belonging to 9, 
for a value of s such that the number 

. s 

*' “ r=n 

satisfies the inequality just above; that is, for a value of s greater than 

n "h 2a 



^i(x) = (1 - 26)"'* exp ^6 



UNBIASED ESTIMATES 497 

Otherwise stated: (here exists no unbiased estimate of h voUh finite sth moment at 
0 = 0,/or 

71 “h 2a 

It is most likely true that this last statement holds, in general, with 

^ n d* 2a 
* = ' 

We shall consider here only the case 

« + 2a _ 

and since the analysis is the same for every pair n, a satisfying this equality,, 
we treat the particular case of 

n = 1, a = 

Thus, we shall show: /or n = 1, there exists no unbiased e^imcUe of ht , 

h,(e) = - 1, 

with finite variance at B — 0. 

We must show that the ratios 

t - 1 ) 


aiTr$^ 

»-l 2 


are not bounded for all choices of m (distinct) 0iB, and all setsjof^w real numbers 
a<, and all m. This is clearly equivalent to showing the sameTor the ratios 


Qim, at, Bi) 


Now we find, by direct computation, 


t a*(l - e-^h 

»»l _ 


i-1 2 i,i-l 

And the solution of the familiar extremum problem: 


sup 

(Ci) <-I 


2^o<(l — e subject to ^ e = 1 


- Z «v/(l - c"‘*b(l - e-***), 


sup %iKin, Oi ,00 ^ 

(at) •./-! 


yields 



4^ 

w^ere the matrix 


E. W. BAKANKIN 


V = (Vij), i,j = 1,2, ••• ,m, 

is the inverse of the matrix 

U = = 1,2, ... ,m. 

We now take, in particular, 

0i = it, i = 1,2, ••• ,m, 

where < is a positive number. Clearly, there exists a number to such that for 

t> to, 

U(t) = 

is non-singular. Also, 

lim U(t) = I, 


the identity matrix. Then, for < > , F = U~^ is a continuous function of U, 

so that 

lim 7(0 = (lim Cf(0)"‘ = 7. 

<-*00 <-*00 

Hence, 

lim ~ hij • 

<-*00 

It follows that 

lim sup Q*(m, a,-, it) = m, 

«-*oo (a<) 

and therefore, 

sup Q*(m, a<, 6i) ^ m. 


(A simple argument on the characteristic values of U shows that there is actually 
equality here.) This result gives the unboundedness of the ratios Q,- and our 
proposition is proved, by virtue of Theorem 2. 


APPENDIX 

The spaces and S, are instances of a Banach space over the reals; that is, a 
complete, normed, linear vector space, closed under multiplication by real 
numbers. That the space, say 35, is normed is to say that there is a non-negative, 
real-valued function, || jj, defined on 35, with the properties: 

11 { 11 = 0 if and only if f is the null vector, 

I|a^il = |al-Ilfll. 

Ilf + ills II £11+ 11 nil; 

where € S3 and a is real. The number || { H is called the norm of 



UNBIASED ESTIMATES 


49 ^ 


The function H { — || on pairs of vectors is a distance function in the 

usual sense. With it, strong convergence (or simply convergence) is defined in 58: 
converges strongly to J when lim |1 f« — {|| = 0. In symbols: f or lim = f. 

n->oo 

The usual set-theoretic notions are now defined in the obvious way; e.g., limit 
point of a set, closed set, etc. That the space 93 is complete means that every 
sequence {f„} satisfying lim |1 — {« || = 0 converges to a (unique) element 

»n,n-*oo 

A linear manifold in 33 is a subset of S with the property tiuit for any two 
elements 17 € 3)i and any two real numbers a, 6, we have also af + hi) e 
A closed linear manifold is a linear manifold that is closed in the set-theoretic 
sense. If S is any subset of then the set, [S], of all finite linear combinations of 
elements of S is a linear manifold; it is the linear manifold spanned by S. 1 'he 
closure of [S], denoted by is called the closed linear manifold spawned by iS. 
In general, [S] is a proper subset of {/S}. A set S C 33 is called fimda7nental 
when {>S} = 33. 

A linear functional^ G, on ® is a real-valued function with the property 
that for any two elements 77 e S and any two real numbers a, b, wc have 
G(a^ + hr)) = aG(^) + bG(ri), The linear functional G is said to be bounded when 
the number 


IIGII 


l.u.b. 


m)\ 

Hill 


is finite. || G || is called the norm of G. (Throughout the text of the paper, the 
qualification “bounded” has been understood in all references to linear func¬ 
tionals). If we define the sum of two linear functionals F and G by (F + G) 
tt) = F{^) + G(J), and make the other requisite definitions in the obvious way, 
we find that the bounded linear functionals on 33 form a linear vector space 
over the reals. The function || || on the bounded linear functionals, which we 
have already called a norm, is in fact a norm in the Banach space sense. This 
vector space, so normed, is readily shown to be complete. Hence it is a Banach 
space—^usually called the conjugate space to 33. It is this space we have referred 
to in the text as the space of linear functionals on 33. 

If a sequence {fn} of elements of 33 has the property that lim G(^n) = G(() 

n-*oo 

for every bounded linear functional G, then is said to converge weakly to f. 
If, of the sequence {{»}, we know only that lim G({„) exists for every bounded 

n-*oo 

linear functional, we say simply that the sequence is weakly convergent. The 
space 58 is called weakly complete if every weakly convergent sequence converges 
weakly to a limit. The spaces 5?r, r ^ 1 are weakly complete. 58 is said to be 
weakly compact if every bounded set /S C 58 contains a weakly convergent 
sequence. That S is “bounded” means l.u.b. || € || < <». 

|.S 

A real Hilbert space $ is a real Banach space on which there is defined an 



m 


E. W. BARANKIN 


inner product; that is, a function ((, i;) on pairs of elements ri, with the 
properties 

a, v) = (n, (), 

(of, 1j) = o(f, 1>), 

(( + v) = (f, v) + (f. v), 

II f ir “ (f. {). 

The inner product is a coniinuous function of both its arguments; i.e., lim fm =• f 
and Hm rin ~ v imply lim (f*, ij*) = (f, v)- The space 82 in the text is a Hilbert 

space when we take (f, jj) = / fij di>. Two elements f, 1? which are such that 

Jq 

’z) = 0 are said to be orthogonal. If S is any set in then the set of elements 
of ^ each of which is orthogonal to every element of S is called the arthocomple- 
ment of S, and is denoted by 

For further elaboration the reader is referred to [13] and [19]. 

REFERENCES 

ll] D. HLA.CKWKLL and M. A. Girshick, "A lower bound for the variance of some unbiased 
sequential estimates,” AnnaU of Math, Stat.y Vol. 18 (1947), pp. 277-280. 

[2] R. A. Fisher, “Theory of statistical estimation,” Camb. Phil. Soc. Proc.^ Vol. 22, 
(1925), pp. 700-725. 

(31 H. Cramer, Mathematical Methods of Statistics^ Princeton Press, Princeton, 1946. 

[4] A. Bhattacharyya, “On some analogues of the amount of information and their use 

in statistical estimation,” Sankhyd, Vol. 8 (1946), pp. 1-14. 

[5] M. A. Girshick, F, Mosteller, and L. J. Savage, “Unbiased estimates for certain 

binomial sampling problems,” Annals of Math, Stat., Vol. 17 (1946), pp. 13-23. 

[6] J. Neyman, “Su un teorema concernente le cosidette statistiche sufficienti,” Giornale 

deWIstituto Italiano degli Attuari, Vol. 6 (1935), pp. 320-334. 

[7] D. Blackwell, “Conditional expectation and unbiased sequential estimation,” 

Annals of Math, Stat,, Vol. 18 (1947), pp. 105-110. 

[8] H. Cram£r, “A contribution to the theory of statistical estimation,” Skand, Aktuar. 

lids,, Vol. 29 (1946), pp. 85-94. 

[91 A. Bhattacharyya, “On some analogues of the amount of information and their use 
in statistical estimation (cont'd),” Sankhyd^ Vol. 8 (1947), pp. 201-218. 

[101 J- WoLFOwiTZ, “The efficiency of sequential estimates and Waldos equation for se¬ 
quential processes,” Annals of Math, Stat,, Vol. 18 (1947), pp. 216-230. 

[11] F. Riesz, Les Systhmes d^Squations Liniaires a une InfiniU d*InconnueSy Gauthier- 
Villars, Paris, 1913. 

[12| F. Riesz, “Untersuchungen tiber Systeme integrierbare Funktionen,” Math, Annalen, 
Vol. 69 (1910), pp. 449-497. 

[13] S. Banach, ThAorie des Opirations LirUaires, Garasinski, Warsaw, 1932. 

[141 N. Dunpobd, “Uniformity in linear spaces,” Am, Math, Soc, Trans,y Vol. 44 (1938), 
pp. 305-^56. 

[15| M. H. Steinhaus, “Additive und stetige funktionaloperationen,” Math, Zsits,, Vol. 
5 (1918), pp. 186-221. 



UNBIASED ESTIMATES 


601 


[16] F. Riesz, '^Sur une espdce de g^om^trie analytique des systdmes de fonctions som> 

znables/’ Comptea Rendua^ Vol. 144 (1907), pp. 1409-1411. 

[17] M. Fr^chet, *^Sur les ensembles de fonctions et les operations lineaires,” Comptea 

Rendua, Vol. 144 (1907), pp. 1414-1416. 

[18] S. Saks, Theory of the Integral^ Stechert, New York, 1937. 

[19] J. VON Neumann, Functional Operatora, (Mimeographed notes) Princeton, 1935. 

[20] G. R. Seth, “On the variance of estimates,” Annals of Math. Slat.,, Vol. 20 (1949), 

pp. 1-27. 

[21] B. J. Pettis, “A note on regular Banach spaces,” Am. Math. Soc. Bull.y Vol. 44 (1938), 

pp. 420-428. 



A SEQUENTIAL DECISION PROCEDURE FOR CHOOSING ONE OF 
THREE HYPOTHESES CONCERNING THE UNKNOWN 
MEAN OF A NORMAL DISTRIBUTION 

By Milton Sobel and Abraham Wald^ 

Columbia University 

!• Introduction. In this paper a multi-decision problem is investigated from 
a sequential viewpoint and compared with the best non-sequential procedure 
available. Multi-decision problems occur often in practice but methods to deal 
with such problems are not yet sufficiently developed. 

The problem under consideration here is a 3-decision problem: Given a chance 
variable which is normally distributed with known variance but unknown 
mean By and given two real numbers ai < , the problem is to choose one of the 

three mutually exclusive and exhaustive hypotheses 

Hi i 6 ai Hz i Oi ^ 6 ^ Oz Hz i 6 ^ az 

In order to select a proper sequential decision procedure, the parameter space 
is subdivided into 5 mutually exclusive and exhaustive zones in the following 
manner. Around ai there exists an interval (^i, Bz) in which we have no strong 
preference between Hi and Hz but prefer (strongly) to reject Hz . Around az 
there exists an interval (^3, Ba) in which we have no strong preference between 
Hi or Hz but prefer (strongly) to reject Hi . For B ^ Biwe prefer to accept Hi . 
For ^2 ^ ^ ^ ^3 we prefer to accept Hz . For ^ ^ ^4 we prefer to accept Hz . 

The intervals (^1, Bz) and (Bz , ^4) will be called indifference zones. The de¬ 
termination of these indifference zones is not a statistical problem but should 
be made on practical considerations concerning the consequences of a wrong 
decision. 

In accordance with the above we define a wrong decision in the following 
way. For B ^ Bi , acceptance of Hz or Hz is wrong. For Bi < B < Bz acceptance of 
Hz is wrong. For Bz ^ B ^ Bz, acceptance of Hi or Hz is wrong. For Bz < B < Ba, 
acceptance of Hi is wrong. For 0 ^ ^4, acceptance of Hi or Hz is wrong. 

The requirements on our decision procedure necessary to limit the probability 
of a wrong decision are investigated. Two cases are considered. 

Case 1: Prob. of a wrong decision ^ y for all B. 

Prob. of a wrong decision ^ 71 for B ^ Bi, 

Case 2: <Prob. of a wrong decision g 72 for Bi < B < Ba, 

^Prob. of a wrong decision ^ 73for B Ba, 

The decision procedure discussed in the present paper is not an optimum 
procedure since, as will be seen later, the final decision at the termination of 

> Work done under the sponsorship of the Office of Naval Research. 

602 



A SEQUENTIAL DECISION PROCEDURE 


503 


experimentation is not in every case a function of only ^^the sample mean of all 
the observations”, although the sample mean is a sufficient statistic for B. Al¬ 
though the procedure considered is not optimal it is suggested for the following 
reasons: 

1 . The decision procedure can be carried out simply. In fact tables can be con¬ 
structed before experimentation starts that render the procedure completely 
mechanical. 

2 . The derivation of the operating characteristic (OC) function, neglecting the 
excess of the cumulative sum over the boundary, is accomplished with little 
difficulty. In general, for other multi-decision problems it is unknown how to 
obtain the OC function. 

3. It is believed that the loss of efficiency is not serious; i.e., the suggested 
sequential procedure is not far from being optimum. In this connection a non¬ 
sequential procedure is compared with this sequential procedure. The results 
show that, for the same maximum probability of making a wrong decision, the 
sequential procedure requires on the average substantially fewer observations to 
reach a final decision. In fact, for Case 1 noted above, if .008 <7 < . 1 , and if 
certain symmetrical features are assumed, then the fixed number of observations 
required by the non-sequential method is greater than the maximum of the 
average sample number (ASN) function taken over all values of B, 

It was found necessary in the course of the investigation to put an upper bound 

on the quantity - - in order that the methods used to obtain upper and lower 

fl 2 — 0,1 

bounds for the ASN function should give close results. This restriction, however, 
is likely to be satisfied in practical applications. 

All formulas for ASN and OC functions which will be used in this paper will be 
approximation formulas neglecting the excess of the cumulative sum over the 
boundaries. Nevertheless, equality signs will be used in these formulas, except 
when additional approximations are involved. 

2 . Description of the Decision Procedure.- We shall assume that the indiffer¬ 
ence zones described above have the following properties 

(i) B\ Oi B 2 Bz K. Oi K, Bi 

(ii) Bi ^ B 2 ^ 2ai ; ffa “t* ^4 ~ 202 

(iii) ^2 — = ^4 — ^3 = A (say). 

* A similar decision procedure was used by P. Armitage [2] as an alternative to the 
sequential i test (with 2-sided alternatives). The form used there is more restricted as-he 
considers only the case dj =» . Essential inequalities on the OC function are pointed out 

but no attempt is made to determine the complete OC and ASN functions. A closely related 
but somewhat different procedure for dealing with a trichotomy was suggested by Milton 
Friedman while he was a member of the Statistical Research Group of Columbia University. 
As far as the authors are aware, no results were obtained concerning the OC and ASN func¬ 
tions of Friedman’s procedure. 



504 


MILTON SOBEL AND ABRAHAM WALD 


Let Ri denote the Sequential Probability Ratio Test for testing the h 3 rpothesis 
that $ — 0 i against the hypothesis that 6 == 62 . We assume for the present that 
either the proper constants A, B in the probability ratio test are given or that 
they are approximated from given a, by the relations 


B 


a 


1 - 


Here a and 0 are upper bounds on the probabilities of first and second t 3 rpes of 
errors, respectively. 

Let Rt represent the S.P.R.T. for testing the hypothesis that 6 — 6 z against the 
alternative that S — 64. For this test we assume that (a, p. A, B) are replaced 
by (a, 0, Aj B) and as above that either A and are given or that they are 
approximated from given a, 

The decision procedure is carried out as follows: 

Both Ri and R 2 are computed at each stage of the inspection until 

Either: One ratio leads to a decision to stop before the other. Then the former 
is no longer computed and the latter is continued until it leads' to a decision to 
stop. 

Or: Both Ri and R2 lead to a decision to stop at the same stage. In this event 
both computations are discontinued. 

The following table gives the rule R for the decisions to be made corresponding 
to all possible outcomes of Ri and Rz . 


Ri 


Rz 


R 


If 

accepts 

and 

accepts 0 z 

then 

accepts III 

If 

accepts 02 

and 

accepts 02 

then 

accepts Hz 

If 

accepts 02 

and 

accepts ^4 

then 

accepts Hz 


We shall show that acceptance of both and 64 is impossible when {A, B) = 
(A, B). For this purpose we need the acceptance number and rejection number 
formulas. (See page 119 of [1]). 

Acceptance Number Rejection Number 


2 n 2 

jRi: ^ log i? + Oi n < 23 < T 'og + Oi n 

A a-l A 


2 n 2 

^2: T log ^ < r log + <hn. 

A a—1. A 

We shall assume for convenience that “between observations” Ri is tested before 
Rt and let the term “initial decision” refer to the first decision made. 

Assume di and 04 are both accepted. Then if Oi is accepted initially at the mth 
stage 




A SEQUENTIAL DECISION PROCEDURE 


505 


Since 

- log B + Oim < ^ log B + OjTO 

it follows that ^4 is rejected at the same stage, contradicting the hypothesis. 
Similarly if ®4 is accepted initially at the mth stage, then 

m 2 

2 ^ - log + 02 m. 

a-1 A 

Since 

2 2 

- log A + chm > ~ log A + Oim 

it follows that 6i is rejected at the same or at an earlier stage, contradicting the 
assumption that the acceptance of ^4 is an initial decision. Hence 61 and 6^ cannot 
both be accepted. 

A geometrical representation of the rule R is given in Figure 1. 

R can now be described as follows: Continue taking observations until an 
acceptance region (shaded area) is reached or both dashed lines are crossed. In 
the former case, stop and accept as showm above. In the latter case stop and 
accept H2 . 

The proof above that ^1 and cannot both be accepted consists of noting that 
a point below the acceptance line for is already below the rejection line for 
d\ and that a point above the acceptance line for B^ is already above the rejection 
line for Bi . 

If a, &) (A, B), a necessary and sufficient condition for the impossibility 

of accepting Bi and Bi is that at n = 1 the following inequalities should hold. 

Rejection Number (of ^1) for Ri ^ Rejection Number (of ^3) for R2 

and 

Acceptance Number (of 0i) for Ri g Acceptance Number (of ^3) for Ro . 

In symbols 

2 2 

~ log A + g ^ log + 02 

and 

2 2 ^ 

^ log B + Oi ^ ^ log B + 0 .. 

These can be written as 

^ s e and ^ ^ 

respectively, where d = oi — oi. 



506 


MILTON SOBEL AND ABRAHAM WALD 


Since ^ > 0, the above inequalities are certainly fulfilled when 

/ ^ B , A 

( 2 . 1 ) § — ^ A ~ 

In what follow in this paper, we shall restrict ourselves to cases where accept' 
ance of both 0i and ^4 is impossible, even if this is not stated explicitly. 



3. Derivation of OC Functions. Let L(Hi 1 0, R) denote the probability of 
accepting Hi when 0 is the true mean and R is the sequential rule used. Let Htt 
denote the hypothesis that 0 = 0i. Since, as shown above, Hi is accepted if and 
only if 01 is accepted, we have 

(3.1) LiHi \0,R)^ L{H0^ I 0, Ri). 

Similarly, 

(3.2) 


L(H,\0,R) - L{Ht,\0,B,). 





A SEQUENTIAL DECISION PROCEDURE 


507 


From the fact that Ri and R 2 each terminate at some finite stage with prob¬ 
ability one, it follows that R will terminate at some finite stage with probability 
one. Hence 

(3.3) L(H2 \d,R) = 1- L(Hi \e,R) - L(Ht | $, R). 

From pp. 50-52 of [1], the following equations are obtained. 

(3.4) LiH, I fl, ft) = 1 e, fti) = 


where 


h] = hi{d) = 


62 0 i — 2 id _ flj — 8 

82 — 81 A 


where 




hi = hi(d) = 


~1” ft? — 26 _ ffla — 0 

di — di A 


These equations involve an approximation, as explained in [1]. 
Hence 

(3.6) UH, I ft ft) = LiH,, i ft ftj) = 1 - L(ft,, I ft Ri) = 


(3.7) UHi 1 ft ft) = 1 - 


A'“ - 1 1-5'*= 1 - ft*‘ 1 - 


VO.i/ 1 >/, i 

Since L(Hi | ^, ft) = L{He^ \ 6, fti), it follows that L{Hi | 6, ft) is a mono- 
tonically decreasing function of 0 and that 

L(Hil-oo,ft) = 1; L(Hil«,ft) = 0 

UHi I ft, ft) = 1 - L{Hi I ft, ft) = |8 

LiHi I oi, ft) = ,-^. 

log A + 1 log ft I 

Similarly, since L(Hi | ft ft) = 1 — L{He, j ft fta), it follows that L{Ht \ ft ft) 
is a monotonically increasing function of 6 and that 

L(ft,l-oo,ft) = 0; L(ft3|<»,ft) = l 

UHi I ft, ft) = 5; L(//3 I ft, ft) = 1 - 4 

T ftJ i ^ D\ — I ^ I 


UHi\ai,R) = 


logi + iiog^r 



508 


MILTON SOBBL AND ABRAHAM WALD 


Since L(,H% j 72) = 1 — L{Hx | 72) — j 0, 72) it follows easily from the 

above results that 

LiHt I — «>, 72) = 0; LiHt | », 72) = 0 
Liflt I df 72) < a for 0 ^ 0i \ LiHt | d, 72) <C ^ for 6 ^ 
__JlogBJ- li<L(.H,\a,,R)< I'”*®' 


log ^ + I log B 
log A 


— jS < hifli I Oj, 72) < 


log i4 + I log B I 
logi 


logi + |log5l '' 'logi + |logfi| 

1 - p - a < LiHi 172) < 1 for Ot^d ^dt. 

4. ProbabilitST of Correct Decision. Denote the probability of a correct 
decision by L(B/R). It is defined as follows: 

Interval Correct Decisions L{fi\R) 

0 ^ 0 i acceptance of Hi L{Hi 1 72) 

01 < 0 < Oi acceptance of Hi or Hi L{Hi | 72) + LiHi \ 0, 72) 

02 ^ 0 g. 01 acceptance of Hi L(Hi | 0 , 72) 

01 < 0 < 0t acceptance of Hi or Hi L{Hi | 72) + LiHi \ 0, 72) 

01 ^ 0 acceptance of Hi L{Hi \ 0, 72) 


It should be noted that at points of discontinuity, L{0 ,1 72) is defined as the 
smaller of the two limiting values. 

We shall now discuss some monotonicity properties of the function L{0\ R). 
From the fact that L(Ht^ \ 0, 72i) and L{Hi, \ 0, Ri) are continuous with con¬ 
tinuous first and second derivatives and are monotonically decreasing for all 
0 with a single point of inflection in the intervals 0i < 0 < 0i and 0i < 0 < 0t 
respectively, it follows that 

(i) L{0 I 72) is monotonically decreasing with negative curvature for — « < 
0^01. 

(ii) L{0 1 R) is monotonically increasing with negative curvature for 04 ^ 

< 00 . 

Making use of (3.3) we have further 

(iii) L(0 1 72) is monotonically decreasing with negative curvature for 0i < 

0 < 01 . 

(iv) L{0 1 72) is monotonically increasing with negative curvature for 0i < 
0 < 04. 

(V) For 01^ 0^01,^ L{0\R) = -[£ L{Hi \ 0, R) + ± L{Hi \ 0, 72)1 is 
d d 

decreasing, since ^ L{Hi \ 0, 72) and ^ L{Hi | 0, 72) are increasing. In 
other words L{0 1 72) has negative curvature for 0i ^ 0 ^ 0i. 




A SBQUENTUL DECISION PROCEDURE 


509 


In the special case when A = <= ^ = ^ and the origin is taken at 

for the sake of convenience, it is easy to see that L{9 \ R) is symmetric with 
respect to the origin and, because of (v), has a local maximum at 9 — 0 . 

6. Choice of the constants A, B, A, 6 to insure prescribed Lower Bounds 
for L(6 I R). We shall deal here with the question of choosmg A, B, A and 6 
such that L(9112) ^ 1 — 7 i when 9 ^ 9i, L(9 1 U) ^ 1 — 7 * when < 0 < 
and L(9 112) ^ 1 — 73 when 9 ^ 94 . From the monotonic properties of the correct 
decision function it is only necessary to insure that 

(5.1) L(9i 112) = 1 ~ 7 i, Z<(9] 112) = L{0i 112) = 1 — 72 and L{di \ 12) = 1 — 73 , 
The following relations will be needed: 

hi{0i) = fhiBz) = 1 = — hiiOi) — —hz^Oi) 


A 



r (say) 


h,{ez) = 

A 



—r 


where d = ^4 — ^2 = ^3 — = 02 ai. 

The following four equations are obtained from (5.1): 

(5.2) 1 - L{H, 19i, 12) = \ 9,, 120 = = 7i 


1 - UHz 192,12) = L{Hi 192,12) + L(H, 192,12) 

(5.3) _ BiA - 1) ,r 1 - B'l 

- -A^B~ + Ll^^J 

1 - urn 19„ 12) = UHz 193,12) + L{H, 1 93 ,12) 

( 5 . 4 ) _ 1 - ^ - 1)1 

" ■ J 

(5.5) 1 - UHz 1 94 ,12) = LiH,, 194, Rz) = • 


72 


72 


The “bracketed terms” represent quantities less than a and /3 respectively and 
if r is sufficiently large they can be neglected. This will be made more precise 
but first let us note the results of neglecting the bracketed terms. 

•From (6.2) and (5.3) we obtain 


72 

1 - 7i‘ 


( 6 . 6 ) 


B(1 — 7 O = 72 , whence B = 



510 


MILTON SOBEL AND ABRAHAM WALD 


From (5.2) and (5.6) 

(5.7) A = - --— whence A — 

7i 7i 

Since the last two equations are obtained from the first two by the permuta¬ 
tion A ,yi—*y 3 , we have 

1 — 72 
t _ 1 - 73 


If 71 = 72 = 7$ = 7 (say) then A = A = ^ = ^= ^~ 7 ^' 

We shall consider the bracketed quantities negligible if the result of neglecting 
them produces a change of less than 20% in [1 — L(ff j R)] at 0 = 02, 03 re¬ 
spectively, i.e., if 


1 _ { ^3 Y 

1 — ^ __ \1 — 72/ ^ 72 

' - { f ~ 73 Y _ / 73 Y ~ ^ 

\ 72 / \1 — 72/ 


H'W - 1) _ (r^) [(Hr^’) “, 7. 

Inequality, (5.9) can be written as 

72 [ ( 1 — y-iY — 7i] _< 72 

(1 — 72^(1 — 7i)'^ — 7 i72 ~ 5 

r 

(1 - 72)^ [75 - I (1 - 71)^] ^ ( 7172 )^ (^ ~ 5) • 


This will certainly hold if 


72g?(l-7.)^ 
f) . 


( 72 Y ^ 2 ? 

Vi - 71 / “ 5 • 


Assume that 71,72 and 73 are each less than Then the last inequality can be 




A SEQUENTIAL DECISION PBOCEDDBE 


511 


written as 
(5.10) 


log 


r ^ 


(I) 


Starting with (5.8) the same relation is obtained except that 71 is replaced by 


73 , namely 

log — 

(5.11) 


log 


72 

Let 

log^ 

= 1 I* - 

log- - 

72 


where 7 is the larger of 71 and 73 . Then k is the larger of the right hand members 
of (5.10) and (5.11). Then for (5.8) and (5.9) to hold it is sufficient that 

r ^k. 


2 

If 72 = .05 and 0 < 71 » 7 a < -1 then k is approximately = 1.54. If 72 = .01 

2 7 

and 0 < 7i, 73 < .1 then k is approximately ~ = 1.35. 

We shall now investigate under what conditions the approximate solution 
obtained above for A, B, A, B are such that acceptance of both 61 and ^4 is 
impossible. 

It follows from ( 2 . 1 ) that the following pair of inequalities are sufficient for 
the impossibility of accepting both 0 i and 64 : 


(5.12) 


^ 1 ~ 72 ^ 2. ^ _ 7? 1 ~ 72 < 2 

A 7i 1 — 7a ~ ^ 73 1 — 7i 


If 7 i 5 ^ 73 let the smaller and larger of the pair (71 , 73) be denoted by 7 and 
7 respectively. Since 1 — 7 > 1 — 7 , then 

72(1 — 72 ) 72(1 — 72 ) 

7(1 - 7) ^ 7(1 - 7 ) 


and we need only consider one of the two inequalities in (5.12). The condition 
72 < 7 will in general satisfy (5.12). More precisely if all the y’s are restricted 
to the interval ( 0 , . 1 ) then 

9 1 — 72 1 — 72 10 

10-1-7^1-7“9 


and it is suflScient for the validity of (5.12) that 72 ^ (.9) 7. 



512 


HILTON SOBEL AND ABRAHAM WALD 


If 7i “ 73 “ 7 (say) then the two inequalities reduce to one 

72 — 7s + 7 — 7* ^ 0 

which can be written as 

(72 - 7 ) (72 — 1 + 7 ) ^ 0. 

Since the inequality 72 ^ 1 — 7 is impossible when all 7*8 are <i, we see that 
72 < 7 is sufficient for the validity of ( 5 . 12 ) when 71 = 73 = 7 < 

There remains the problem of finding an approximate solution for equations 
(5.2) to (5.5) when r < k. Since 

d — 7) Ot — di + ^ 


2 2 


we merely have to consider the interval 1 ^ r < A:. 
The following approximations are used 


(5.13) 


A - B A’ A - B ’ A' - B' A' 


1-5 1 B^(A' - 1) 

■2-2 ~ o 

A - B A A^ - B' 


MA - 1 ) 
A - B 


^ B, 


which upon substitution yield 

(5.14) 

(5.15) 

(5.16) 

(5.17) 


71 

6 - yz 

B + Jr =yz 
^ = 72 . 


Subtraction of (5.17) from (5.16) shows that B = 4 is a solution. Substituting 

A 

this result back in (5.16) leads to the equation 

(5.18) B + = y2. 

It can easily be verified that between zero and unity this equation has exactly 
one root. Since 1 ^ r < the root of the above equation lies between ^ and 

A 

72- 

Taking 72 as a first approximation for B and substituting 72 + e for 5 in 

(5.18) , we obtain 


« + (73 + e)' «» 0 . 



A SEQUENTIAL DECISION PROCEDURE 


513 


Expanding (tj + id a power series in c and neglecting second and higher 
order terms, the above equation gives 


Thus, 

(5.19) 


e 


yt 


1 + Tyl * 


B = 


1 


72 

1 + ry'i ‘ 


72(1 + (r — 1)72 *] 
1 + ryr' 


It is necessary to investigate under what conditions the above approximate 
solution satisfies (5.2) to (5.5) to within a 20% error in [1 — Li$/R)], i.e., such that 


(520) 

7i ^ 7i(l — E) . ^ 7i 

l-y,B 5 


(5.21) 

78 ^ 78(1 - B) _ ^ 73 

" 5 ^ 1 - t.B “ < 6 


(5.22) 

72 ^ B(l — 7 i) B'(l — yl) 

5 1 - 7x5 1 - (yzBY 

^ 72 
72 < -r 
5 

(5.23) 

72 ^ 5(1 — 72 ) , 5'(1 — 7i) 

5 ^ 1 - 785 1 - (yyB)' 

72 

where for B the value in (5.19) is understood. 



It can be shown that if 71 , 72 ,78, are each between zero and .1 then the 
inequalities (5.20) to (5.23) hold. Furthermore it can be shown that if, in addition 
7 j ^ min (71 , 7 i) then also the inequalities ( 2 . 1 ) hold. The latter inequalities are 
sufficient to ensure the impossibility of accepting both 0i and 04 . 


6. Bounds for the ASN Function. First we shall derive lower bounds for 
the ASN function. Let E(n/6, R) denote the expected value of n when 0 is the 
true mean and R is the sequential rule employed. For 0 < 02 the probability of 
coming to a decision first with Rz is large and therefore 

E(n/0, R) ~ £?(n/0, Ri) 0 < 02. 

From the definition of R it follows that 

E(nl0, R) > E{n/0, Ri) for all 0. 

Hence E(n/0, Ri) serves as a close lower bound when 0 < 02 . 

Similarly 

E(nj0, R) E{nj0, Rz) for 0 > 0} 

E{n/0, R) > E{n/0, Rz) for all 0. 

Hence E(n/0, Rz) serves as a close lower bound for 0 > 0j. 

Combining the above we have 

(6.1) E(n/0, R) > Max [E(n/0, Ri) , E{n/0, R.)\ 



514 


MILTON SOBEL AND ABRAHAM WALD 


where, neglecting the excess over the boundary, 

(6 2) E{,n/Q Ri) = ^ -^(^< 2/^1 -^i) ^08 

fi (« - Oi) 

(6.3) E(,n/d, Ri) = f ^ 2 ) log + L(.H,Jd, ^ 2 )Jpg j 


0-2 


- 02 ) 


Formula (6.1) gives a valid lower bound over the whole range of 6, but this 
lower bound will not be very close in the interval (^ 2 , ^ 3 ), particularly in the 


neighbourhood of the mid-point 


^2 + ^3 
2 


. The authors were not able to find any 


simple method for obtaining a closer lower bound in this interval. The upper 
bound given later in this section will, however, be fairly close also in the interval 
(^ 2 , ^ 3 ) and can be used as an approximation to the exact value. 

We shall now derive upper bounds for the ASN function. Let Ri be the follow¬ 
ing rule: “Continue to take observations until Ri accepts ^ 1 .” Since this implies 
the rejection of 6^ at the same or at a previous stage, it follows that R must 
terminate not later than Rt . Hence 


(6.4) 


E{n/e, Ri) ^ E{n/e, R). 


As a matter of fact one can easily verify that E{n/d, R*) > E{nlO, R). Thus 
E{nld, R*) is an upper bound for EiyijQ, R). This upper bound will be close 
when the probability of accepting 6i is high, i.c., for 6 ^ di. 

By the general formula 


Ein) = 


E{z) 


(see p. 53 [1]) we obtain, upon neglecting the excess over the boundary. 


(6.5) 


Ein/e, Rt) 


log R 

f, (e - a.) 


This coincides with (6.2) when Ri) = 0. 

Similarly, if R* denotes the rule of continuing until Ri accepts Oi , then 


(6.6) 

E{n/e, R*) > E{n/e, R) 

(6.7) 

E(n/d,Rt)- . 


ie - Oi) 


and this will be a close upper bound for 6 ^ 
li A = A = ^ = and if Oi + oj = 0 the above results reduce to 



A SEQUENTIAL DECISION PROCEDURE 


515 


( 6 . 8 ) R) Ein/d, Rt) = for $ g 0 , 

A + a 

( 6 . 9 ) E{n/d, R) E(n/d, Rt) = for e ^ 6^ 

where the symbol stands for a close inequality, and where 

h = ~ log ^ and X = = —ai. 

To establish an upper bound for the ASN function in the interval 02 < 6 < dz 
we shall restrict ourself to the case where A — A = These relations are 

fulfilled by the approximate values of Ay B, Ay B suggested in section 5 when 
7 i = 72 = 73 and r ^ k. Wc shall choose the origin to be at , i.e., we put 

= 0 . Then the vertex P of the triangle (Pi, P2, P) in diagram 1 lies on 

h 

the abscissa axis and OPi = OP2 = h. The abscissa of the vertex P is - = A (say) 

A 

y 

where X = aa = — ai. Let 2 / = 23 represent the sum of the first N obscrva- 

1-1 

tions. Let P23 denote the rule: “Continue until both 62 and ^3 are accepted”. 
This is tantamount to neglecting the two outer lines in diagram 1 , i.e., the accept¬ 
ance lines for di and ^4. Then clearly, 

( 6 . 10 ) E{n!dy Rzz) > E{nl0y R). 

When 0 lies between 02 and 0z this inequality will be close, since the probability of 
crossing either of the two outer lines is then small. 

However E(n/0y P23) was found difficult to compute and it was necessary to 

.V 

consider instead the rule P23 : ‘‘Take N observations. If 2/ = 23 < 0 then 

continue until 02 is accepted. If 1/ > 0 then continue until 0z is accepted”.® Clearly, 

( 6 . 11 ) E{n/0y P23) > E{nl0y R2z). 

This inequality, however, will be close only if the probabilit}" of concluding the 
test before N observations, given that ^2 < ^ < ^3, is small. 

Some investigations by the authors seem to indicate that the inequality ( 6 . 11 ) 
will be close when A < A. This inequality is likely to be fulfilled in practical 
problems. 

We shall now proceed to determine the value of E{n/0y P23). Neglecting the 
excess over the boundary, we have 

( 6 . 12 ) E {n/e, R'n = = + for y > 0 

• The event y ■■ 0 has probability zero and it is indifferent what rule is adopted for that 
case. 



516 


MILTON SOBEL AND ABRAHAM WALD 


and 

(6.13) E Ra,^Xi- 2/) “ ^ V <0 

where, for any condition C, E(n/$, R, C) denotes the conditional expected value 
of n given that the true mean is 6, that R is the sequential rule used and that the 
condition C is fulfilled. 

Multipl 3 dng with the density of y and then integrating with respect to y, we 
obtain after simplification 

(6.14) E(n/e, ft«) = [ax + (i + 2a j/g 

r g-<»*/2) 

dy, and Ot < 0 < 0%. 

In particular, for 5 = 0 we get 

(6.15) E{n/d - 0, Rn) “ * + /|/^. 


To establish a close upper bound for < 04 we must bring the line of 

acceptance of 04 into account. The line of acceptance of 0i can be disregarded 
since the probability of accepting 0i is very small. 

We therefore define the rule Rs 4 as follows: 

“Continue with Ri until 02 is accepted and with R 2 until either 08 or 04 is 
accepted.” 

Since the ASN function for Ru is difficult to compute we define a modified 
rule Rm as follows: 


“Proceed to take N 

K 



observations without regard to any rule. If y = 


2 X,- < 0 then continue only with Ri until 02 is accepted. If 0 < y < 2A then 


continue only with R 2 until either 08 or 04 is accepted. If y ^ 2A then stop taking 
observations and accept ils 
It is clear that the following inequalities hold 


(6.16) E{n/0, Ru) > E{n/0, Ru) > E{n/0, R). 


The proximity of E(n/0, Ru) and E{nl0, R), as stated above, is based on the 
fact that the probability of accepting 0i , when fla < 9 < ^ 4 , is small. 

The proximity of E{nl0, Ru) and E{nj0, Ru) is assured if the probability 
of terminating with Ru (and with R) before N observations is small. It can be 
shown that the latter condition is fulfilled when A < X. In terms of the quan¬ 
tity r defined in Section 5 this can be written as r > 3. 

To determine the value of E{n/0,Ru) the following two preliminary results 
will be needed: 



A SEQUENTIAL DECISION PROCEDURE 


517 


If 0 < y < 2h, 


, 2h — y — 2h 


i_ 1 _ g-(4»/»‘)(X-»> J 

1 X^ 

9 - X 


= Cisay). 

lfy<0, 

(6.18) E (n/e, RU ,J2xi = y^ = ^- = -D (say). 

Both are easily obtained from formula (7.25) on p. 123 of [1]. 

Multiplying with the density of y and integrating with respect to y, we obtain 
after simplification 

EWe. B.'.) - * + [♦ (5^* + * (I yl)] 


2e' 




(6.19) 


(X-0)|\ , , 

' 1 + e 


+ 


r -()i»»/2X»») _ -A(2X-»)»/2X»«i 

X(X - 0) r 22r 


kd 


2X(x + e) 




-(A»S/2X»») 


Formula (6.19) is an improvement on (6.14) as it will give for any 0 a smaller 
upper bound, but in the neighborhood of the origin the difference is insignificant. 
For 9 = X we obtain from (6.19) using L’Hopital’s rule 


Ein/\f Ru) — 


( 6 . 20 ) 


4X<7’* 


(4/iX - 3<r*) 




-(*X/i!»«) 


If 


\/h\ 


> 2.5, the above formula can be approximated by 


-(*X/2»«) 


(6.21) E(n/\, Ru) ~ ^ ' 

ft* ft* 

Since the right hand member above lies between and (1.002) when 
•\/n ff* <r* 

> 2.5 then for practical purposes 

ff 


( 6 . 22 ) 


E(n/\, Ru)<^-, 


when > 2.5^. 



518 


MILTON SOBEL AND ABRAHAM WALD 


An upper bound for E(n/0, R) for 6i < $ < 0^ can be obtained by defining 
Rii and Ri» in an analagous way to Ru and Ru . Because of reasons of synunetry, 
E{nl0, Rit) can be obtained from (6.19) by replacing 0 by —0, 

The method used for obtaining upper bounds for E(n/0, R) can easily be 

extended to the more general case when the equalities A = A = ^ = ^do not 

necessarily hold. However, the resulting formulas are more cumbersome and we 
shall merely give without proof the upper bound corresponding to (6.14). This 
upper bound becomes 

£(«/., fii) = AT + - *«.)] + - ♦(»] 


where 


All = ^ log A 
h 21 = ^ log A 


Alo = ^ log B 

2 

A 20 = ^ log B 


<Z2 “ - Cil ~ \ 


A, - Nd 


h + NO 


““ <rVN ’ aVN ’ 

7. An Example. We shall consider the following example 


All + Ajo 
2 


<r* = 1, 02 = — 0i = 7i = 7» = 73 = 7 ^ -029 

then 

1 1 1 - -y 

A_A_— — — ^ • 00 K O U ^ . 'i 


A = A = 


B E 


= 33.5 


r = 7»3>A~ 1.47 


A = ^ log A = 28, \ = = i, A = (>2 - »i = - tfi = i. 

Using formulas (6.1) and (6.7) the following upper and lower bounds were ob¬ 
tained 


® 16 16 16 16 16 16 16 16 16 16 16 


Ujyper bound . 448 224 149 11289.674.7 56 44.837.3 32 28 



Lmver bound . 1 421 224 149 112 89.674.7 56 44.837.3 32 28 
















A SEQUENTIAL DECISION PROCEDURE 


519 


Formulas (6.14) and (6.1) yield 


0 

0 

1 

16 

! 2 

16 j 

3 

16 

Upper Bound . 

146 

163 

229 

450 

Lower Bound . 

112 

149 

224j 421 


In the neighborhood of the origin the true value is very nearly the upper bound. 
From formulas (6.19), (6.22) and (6.1) we obtain 


0 

3 

16 

4 

16 

.5 

16 

Upper Bound . 

422 

784.5 

423 


Lower Bound . 

421 

784 

421 



As shown above for the end points of the indifference zone, (6.19) gives better 
results than (6.14) or (6.7). This is as it should be since (6.19) takes into account 
possibilities omitted in (6.14) and (6.7). The greater accuracy of (6.19) is offset 
by a slight increase in computation. 

In the graph of the Bounds of the ASX function shown in Figure 2, a single 
curve is shown w^herever the upper and lower bound are sufficiently close to 
each other. 

Since (6.14) contains an even function of 9 and since elsewhere the correspond¬ 
ing bounds are mirror images with respect to ^ = 0, the bounds for negative 9 
are exactly the same as those for the corresponding positive 9. 

Consider the following non-sequential rule applied to our problem. With a 
fixed number No of observations compute the mean x and accept HiH x falls in 
the interval (— Qo, Ui), accept H 2 ii x falls in [ai, 02 ] and accept H^ii x falls in 
(a 2 , 00 ). This is certainly a reasonable procedure. One can also verify that no 
other non-sequential rule exists that is uniformly better (for all possible values of 
9) than the one under consideration. 

The two decision procedures become comparable if we introduce the indiffer¬ 
ence zones and define a wrong decision in the non-sequential case exactly as was 
done for our sequential procedure (see Section 1). 

For the non-sequential case (just as in the sequential case) the probability of a 
wrong decision will be discontinuous at 9i, 92, 9z and ^ 4 . At each of these points 
there will be a left-sided and right-sided limit, different from each other. As in the 
sequential case we shall take the probability of a wTong decision at a discontinuity 
point to be equal to the larger of the left and right hand limits. One can easily 
verify that the maximum probability of a wrong decision occurs at 9 = 9z (which 
is equal to the value at ^ = ^ 2 ). 










520 


MILTON SOBEL AND ABRAHAM WALD 


We then determine No by setting the maximum probability of a wrong decision 
equal to 7, i e. 

(7.1) 0 VFo) + VFo) = 1 - 7 . 


UPPER AND LOWER BOUNDS FOR THE ASN FUNCTION 



Fiocre 2 

For the particular problem conadered above, this gives IVo = 915.4. Hence 
916 observations are required in order to ensure that this non-sequential pro¬ 
cedure will have the maximum probability 7 = .029 of a wrong decision. This 
is to be compared with the maximum over all of the ASN function in the 
sequential procedure, which was 784.5. 

Returning to (7.1) we shall derive lower and upper bounds for the root of that 
equation. Since 


d - A/2 


A 


«»o 




A SEQUENTIAL DECISION FBOCBDUBE 


521 


it is clear that the root of the equation 

is an upper bound for the root of (7.1) and that the root of the equation 
0 ( 00 ) + = 1 — 7 


or 


*{t - 2- - 

is a lower bound for the root of (7.1). We shall compare the value of ® 

2(7* 

with the value of y ^ ^ \/Max ASN. Since 

Z(T 6 

^2 2 / , \2 * 
Max (ASN function) ~ ~ hog —(for sufficiently small ~ ). 


then 


y = ^ VMax ASN “ log - - - (for sufficiently small ^). 

Z<r $ ^ y a 


The following table gives upper and lower bounds for x and the corresponding 

value of y for the type of example under consideration, i.e., when A = A = “ == ^ 

Jo Jo 

and r ^ fc. 


7 

.001 

.002 

.005 

.008 

.01 

.05 

.1 

X and £ 

3.08-3.31 


2.57-2.81 

2.41-2.65 

2.33-2.58 

1.64-1.96 

1.28-1.65 

y 

3.45 

3.11 

2.65 

2.41 

2.30 

1.47 

1.10 


As the table shows^ for .1 > 7 > .008 

x> x> y 


♦Actually, the inequality in question is shown only for the values of 7 given in the 
table. However it can be verified that the inequality remains valid for all values of y be¬ 
tween .1 and . 008 . 







522 MILTON SOBEL AND ABRAHAM WALD 

and hence 

No > Max ASN (for sufficiently small 

The statement and the table above are not meant to delimit the region in which 
the sequential rule is superior to the non-sequential procedure. 

REFERENCES 

[1] Abraham Wald, Sequential Analysis, John Wiley and Sons, 1947. 

12] P. Armitage, ‘‘Some sequential tests of student’s hypothesis,” Supplement to the Journal 
of the Royal Statistical Society, Vol, 9 (1947) No. 2, p. 250. 


<I 



MOMENTS OF RANDOM GROUP SIZE DISTRIBUTIONS^ 

By John W. Tukey 
PrinceUm University 

1. Summary. A number of practical problems involve the solution of a mathe¬ 
matical problem of the class described in the classical language of probability 
theory as follows: “A number of balls are independently distributed among a 
number of boxes, how many boxes contain no balls, 1 ball, 2 balls, 3 balls, and 
so on.” Problems arising in the oxidation of rubber and the genetics of bacteria 
are discussed as applications. 

A method is given of solving problems of this sort when ^^how many” is 
adequately answered by the calculation of means, variances, covariances, third 
moments, etc. The method is applied to a number of the simplest CAses, where 
the number of balls is fixed, binomially distributed or Poisson and where the 
‘^sizes” of the boxes are equal or unequal. 

2. Introduction. The distribution of the number of empty boxes has been 
investigated by Romanovsky in 1934 [3], and, apparently independently, by 
Stevens in 1937 [4]. Romanovsky investigated the case of N equal boxes and 
m balls for (i) the case where the balls are independent, and (ii) the case where 
there is a limit to the size of each box. He gives no motivation for the problem, 
and shows that certain limiting distributions approach normality. Stevens 
investigated the case of m independent balls for N boxes (i) of equal size, and 
(ii) of unequal size, and developed a useful approximation for the last case. 
Stevens was concerned with this problem in order to test box counts for non¬ 
randomness by comparing the number of empty boxes with expectation. The 
reader interested in that problem is referred to his paper. 

The results derived in Part II are based on the use of a chance generating 
function, a technique which applies easily to the case where the balls are inde¬ 
pendent. Thus Romanovsky ^s results for the case of boxes of limited size are 
neither included or extended. For the other cases where the number of empty 
boxes has been considered, the results below seem to provide simple moments 
and cross-moments for the numbers of boxes with any number of balls to the 
extent previously available for the number of empty boxes. Both Romanovsky 
and Stevens investigated the actual distribution of the number of empty boxes. A 
similar investigation of the distribution of the number of 6-ball boxes has not 
been carried out here. 

3. A chemical problem. In studying the oxidation of rubber, Tobolsky and 
coworkers were led to propose the follow’ing problem: ‘Tf a mass of rubber 
originally consisted of N chains of equal length, if each chain can be broken at a 

^ Prepared in connection with research sponsored by the Office of Naval Research. 

523 



524 


JOHN W. TUKBT 


large number of places by the reaction with one oxygen molecule, if there are w 
oxygen molecules each equally likely to react at each link, and if mNp molecules 
have reacted, what is the probable number of original chains which are now in 
5 + 1 parts as a result of h oxygen molecules having reacted with h of their links? 

Here an original chain plays the role of a box and an oxygen molecule the 
role of a ball. The sort of numbers which may be taken as characteristic are: 

N — 10 ** (number of chains), 

tn = 10 ** to 10 ** (number of oxygen molecules), 
mp = 0.01 to 100 (average breaks/chain). 

Thus it is almost certainly going to be appropriate to use the results obtained by 
assuming N and m very large and p = l/N very small. We shall return to this 
example after discussing the general results. 

4. A bacteriological problem. The experiments of Newcombe [ 1 ] on the 
irradiation and mutation of bacteria have prompted Pittendrigh* to propose the 
following problem: “Suppose a large number of bacteria each contain m enzyme 
particles, which have been formed by the action of a nuclear gene. Suppose 
that irradiation destroys the nuclear gene in a certain fraction of the bacteria. 
Suppose three generations to occur, during which the m original enzyme particles 
are randomly distributed among the 8 descendants of an original bacterium. 
If a bacterium without either nuclear gene or enzyme particle is a recognizable 
mutant, what is the expected distribution of “families” with 0, 1, 2, 3, • • • , 8 
mutants?” 

Here the enzyme particles are the balls, and the 8 descendants are the N 
boxes. We are interested in the number of empty boxes—the problem is that 
discussed by both Romanovsky and Stevens, with the exception of an allowance 
for cases where the nuclear gene was not lost. We shall return to this problem 
also after discussing the general results. 

5. The case of large numbers. In case the number of “balls” and “boxes” is 
large, it is natural and has been customary in similar problems to replace discrete 
variables by continuous, and derive differential equations. The process runs as 
follows: Let yo, j/i, ^2 , • • • , yj, • • • be the fractions of the total number of boxes 
containing no, one, two, • • • , 6, • • • balls. Let t be the average number of balls 
per box (artificially made continuous, so that we may, for example, have a total 
of 13 4- 3ir balls). Increase i to < + d<, then of the yo boxes previously containing 
no balls, yodt will receive one. Of the yi boxes previously containing one ball each, 
yi dt will receive a second, and so on. Hence 







526 


JOHN W. TUKBY 


Poisson distribution, the caution suggested by (I) is often shown unnecessary 
by (II). For this type of problem the differential equation is entirely adequate! 

It is further shown in Part II that, in the Poisson case, the second moments 
are exactly those which correspond to random sampling from an infinite popula¬ 
tion \vith the fractions indicated by the mean number of boxes with 0, 1, 2, • • • , 
6, • • • balls. This result is not accidental, and it is shown in Part III how we can 
see directly that the whole distribution in this case is that of random sampling 
from such a population. 

6. The case of small numbers. The results of Part II also allow us to state 
the means, variances, and covariances, for the cases where the differential 
equations do not apply. The results are set forth in the following tables: Tables 1 
and 2 apply to the cases where m balls are distributed among the given boxes 
and possibly others. Thus the total number of balls in the given boxes is either 
fixed, when there are no other boxes, or follows a binomial distribution. 


TABLE 2 

A fixed or binomial number of balls and unequal boxes 


HYPOTHESIS 

A total of m balls are independently distributed into N boxes or elsewhere, the 
chance of a particular ball entering the ith box being . The average of the 
Pi = p. The sum of the squared fractional deviations of from p is A. 
Pi = p(l + Xi),SiXi = A. Terms in SiXi, SiXj, etc. are to be neglected. The 
number of boxes each containing exactly b balls is . 


Mean of = E{n^ 


2iV(l - v)V ^ 


p)”* times 


((mp — by — (m — b)p^ — 6(1 — p)“) 


I Variances and covariances as in Table 1, using 


«•'«(■- J) (' - (ri-JT (^r (' • S) 


where = 26c ^2p — + terms in p* and in ^ 

The exact value of 4/ is given in Section 10. 


Mean of no = Nil - p)” 1 + - 


Ap^mim — 1 )^ 


2Nil - p)» 


Mean of ni = Nmil — p)”^‘ p (1 + 


Aim — l)p(l — mp)' 
mi - p)» , 





RANDOM GROUP SIZE DISTRIBXn'IONS 


627 


TABLE 3 

Poisson balls and equal boxes 
HYPOTHESIS 

A number of balls with the Poisson distribution, and expectation N't are 
independently placed in N boxes. The number of boxes each containing 
exactly^b balls"is nt . 


i _ 

Mean of nt - Eiub) — N e ‘ 
Variance of nj = iV ( ^ e ‘ 
Covariance of nt, and tu — 


Mean of no = Ne~‘, 
Mean of ni = Nte~\ 
Variance of no = Ne~\l 
Variance of ni = Nte~\\ 
Covariance of no and ni = 


- e-'), 

- te-\ 



7. Discussion of the chemical problem. The number of oxygen molecules 
which have reacted in a given time is, at best, distributed Poisson. Thus the 
differential equations would give the expected number of cuts, even if the 
number of balls or boxes were not large. 

The fact that the numbers of balls and boxes, are large makes the variances 
and covariances so small as to be practically unimportant. Thus, for example, 
with N = 10**, < = 1 (1 break per chain), we have: 

mean of no = - X 10**, 
e 


mean of ni 


= - X 10**, 
e 


variance of no = - 
e 

variance of rii = - 
e 



X 10^^ 
X 10**, 


1 18 

covariance of no and ni = — X 10 . 


Thus the standard deviations are less than 1 part in 100 million of the mean. 




528 


JOHN W. TUKEY 


TABLE 4 

Poisson balls and varied boxes 







RANDOM GROUP SIZE DISTRIBUTIONS 


52 » 


The results are 

mean number of mutants = E{no) = 8(1)”, 
variance of same = 8(J)" — G4(J)*" + 56(f)”, 
For small values of m we get the values tabled below: 


TABLE 5 
Blanks out of 8 


m 

mean 

variance 

/, mean\ 

0 

8 

0.000 

0.000 

1 

7 

0.000 

0.875 

2 

6.125 

.109 

1.436 

3 

5.359 

.262 

1.769 

4 

4.689 

.417 

1.941 

5 

4.103 

.556 

1.998 

6 

3.590 

.666 

1.979 

7 

3.142 

.747 

1.908 

8 

2.749 

.799 

1.804 

9 

2.405 

.825 

1.682 

10 

2.105 

.829 

1.551 

15 

1.079 

.663 

.934 

20 

0.554 

.426 

.515 


We notice that the variance is substantially less than the mean. 

Now it might be that the number of enzyme particles is not constant from 
bacterium to bacterium. It would not be unreasonable if it had a Poisson dis¬ 
tribution. If this were the case, we would revert to the differential equation 
solution, which is also given in Table 3. The last column in Table 5 shows the 
variance which would then arise for the same means. The variance is still some¬ 
what less than the mean. The situation is shown graphically in Figure 1. 

If the actual distribution of tio is desired, then it can be calculated for the 
case where m is fixed from the tables in Stevens’ paper [4], and when m is 
distributed Poisson it is merely a binomial distribution. 

PART II 

DERIVATIONS 

9. The dinnrw generating function. We are considering the following class of 
problems: “balls” are placed independently in “boxes” and then the number no 
of empty compartments, the number ni of compartments containing exactly 



5S0 


JOHN W. TUKEY 


one ball, • • • , the number nb of boxes with exactly b balls, and so on, are observed. 
We are interested in the moments of wo, Wi, na, • • • , , • • • both simple 

and mixed. 

RATIO OF VARIANCE TO MEAN 
FOR NUMBER OF EMPTY BOXES OUT OF EIGHT 



Figure 1 

We define chance quantities Xiq by 

I Xf gth ball in the ith box, 

1, otherwise. 

Clearly the product of all Xiq for fixed i is given by 

TL^iq » X ^ the ith box) 

Thus if and only if there are exactly b balls in the ith box. Hence 

the coefficient of in SiUqX^q, the sum of IlqXiq over all boxes i, is Ub, the 
number of boxes containing exactly b balls. 




HANDOM GROUP SIZE DISTRIBUTIONS 


631 


We have the relation 

Si,n6x'’ = f(x) = Sillja:.,,, 

where f(x) is a chance function, and the rib and the are chance quantities. 

Now we take expectations of both sides, and use the fact that the expectation 
of a sum is the sum of the expectations to obtain 

^^’’Eirib) = E(S(x)) = 2,Ein^„). 

Now Xit, and x,,, for g r, are independent since they are determined by 
different and independent balls. Hence E(IIf^ig) = Ut(Exi,) and we have the 
basic formula 

(1) E(f(x)) = Xbx'‘E(nb) = 

10. Higher moments. By extending this device, we can obtain generating 
functions for higher moments. Instead of the x^y , we introduce a whole sequence 
of chance quantities ,yiqjZ^qy • • • , Wtg , defined by 

2/> • * • ) ball in itli box, 

(^ta , yiq J * * * > '^iq) ~ . 

1^(1, 1, • • * , 1), otherwise. 

We find immediately that 

f(^)f{y) * * * /(^) ~ (Stnga?ta)(2)ynr2/yr) • * • i^nJlp^np) 

= • • • ^n^qf^tqyjq * * * ^nq • 

Taking expectations on both sides 

E(f{x)f{y) .. • f{w)) = S,Sy •. • • * • w.q) 

= 2/f2)y • • • ^rJl.qBJ(Xiqyjq ’ * * Wnq)j 

where we have used the fact that Xxqyjq • • • Wnq and xi^yjr * * * ^nr are independent 
when q r since they are determined by different and independent balls. 

On the other hand, 

/(x)/(y) • • • /(«’) = (Sfcn»x')(S.7i.y ) • • • (S.n,«;“) 

= * * * 2)a(7Z6?lc * * * ^a)(^ 2/ * * * 11? ) 

SO that 

E(S(x)f(y) • • • f(w)) = S6S« • • • 2«(a:V • • • w'')E{niric • • • n«). 

Equating the two expressions for the expectation of f(x)f(y) • • • f(w), we have, 
finall y, the generating function for E^ribUe • rin) in the form 

(2) (,x‘y‘ • • • • • • w.) = ■ ■ ■ Wn®). 

Thus a knowledge of E(xi,yit '' ’ «''»«) will allow us to determine the moments 
of the n’s. 



532 


JOHN W. TUKBY 


11. A fixed or binomial number of balls and equal boxes. Let there be N 
boxes, and m balls, each with probability p of entering each box. If pN = 1 we 
have the case where m balls always appear in the boxes taken together—^the 
case of a fixed number of balls. If pN < 1, the number of balls appearing in all 
boxes taken together is a binomial with expectation mpN, 

Now a:,-, equals 1 with probability 1 — p and equals x with probability p, 
hence ( 1 ) becomes 

= 2 :.n,(l - p + px) = N (1 - p + px)”. 

Using the binomial theorem, the coeflScient of x'' is 

(3) E(nO = N (1 - p)’"-V = N (1 - p)" 

Now if p is small, we may approximate 1 — p by e~^ and by 1 , respectively, 
in its two occurences, where 

E(ni) 

and if m is large compared to b this becomes 

E{n,) e-”'\ 

o! 

12. Second moments. We must study E{xiqyjq). If i = j then this is 
(1 — p + jxcy) since the gth ball falls into both the ith and jth boxes with proba¬ 
bility p, otherwise into neither. If i j, we immediately find the expectation 
to be (1 — 2 p + pa: -|- py). 

Hence, since i = j in N cases, and i 9 ^ j in N{N — 1 ) cases, 

ZijUqEiXiqyjq) = N(l “ p + pxp)” + N(N - 1)(1 - 2p + px + py)”, 
by ( 2 ) this equals 'Sbj^y’‘Einbnc), and using the multinomial expansion we find 

£(n»n«) = N{N - 1 ) (1 - 2 p)”-‘-'p‘‘ + d{b, c)N (1 - p)”-‘p‘, 

where i(6, c) = 1 when 6 = c and is zero otherwise, and where the multinomial 
coefficient ^ given by 

( m\ _m!_ 

be) blc!(m — b — c)l‘ 

We now set 

(4) E{nin,) = £(na)iB(n,)$(6, c) + 5(6, e)E(flb), 



RANDOM GROUP SIZE DISTRIBUTIONS 


533 


when 



where = u{u — 1 ) • • • (m — 6 + 1) denotes a descending factorial with b 
factors. 

Notice that, if the nt were independently distributed in Poisson distributions, 
the second moments would be given by the same formula with <E>(6, c) = 1, 
while if they were distributed like a multinomial sample from an infinite popula¬ 
tion the second moments would be given by the same formula with f>(6, c) = 1 — ^. 

For small p, we have 


4>(6, c) 



(m - c)'« 

TO»> ’ 


and if m is large compared to b and c, this approaches the multinomial value 

4>(6, c) Srf ^1 - 


13. Variances and covariances. The variances and covariances are given by 
Variance (rit) = E{ni,ni,) — E(nb)E{nb) 

= EiribHl - (1 - 4 >( 6 , b))Einb)), 
and 

Covariance (n>, n«) = — (1 — <^(6, c))E{nb)E{ne). 

Thus the covariance of «6 and «« will vanish when, and only when 4>(6, c) = 1. 

Let us suppose pN = 4, with p small and m and N large, and see if $(6, c) 
p 

can be unity. Since a preliminary calculation shows it to be reasonable, let us 
put m = yN. Then 

Hb. c)«(1 - pp) (1 - + p)*^. 



534 


JOHN W. TUKET 


Aa easy calculation shows that the ratio of descending factorials is nearly 

~-bclyN ^ 

o — C f 

making further natural approximations, 

In 4>(6, c) ^ — fip — — p — yNp^ + (5 + c)p 

7 

and this may be written 

In <^( 6 , c) X -^(^( 2 ^ - b - c + + 4pc - 0> - 

and this vanishes for real 7 when and only when | 6 — ^ — c | > This, 

then, is the condition on b and c which permits the existence of two ratios of m 
to iV so that for either ratio and large N there will be no correlation between 
Hi and ric. 


14. Higher moments. To deal with the third moments, we need E^Xi^yj^hq), 
which is easily seen to behave as follows; 


Relation of ijk 

number of occurrences 

Expectation of XiqyfqZUq 

i = j = k 

N 

1 - p + pxyz 

i = j ^ k 

N{N - 1 ) 

1 — 2p + v^y + 

i = k ^ j 

N{N - 1 ) 

1 — 27 > + pxz + py 

j = k 9 ^ i 

N{N - 1 ) 

1 — 2p + pyz + px 

different 

N{N - \){N - 2 ) 

1 — 3p + px + p?/ + 


Thus we have 

^bi-dx’y z^EinbUciid) — N{1 — p + pxyz)”' + N{N — 1)(1 — 2p + pxy + pz)” 

+ N{N - 1)(1 - 2 p + pxz + pj/)” + N{N - 1)(1 - 2 p + pyz + p-c)" 

+ N{N — 1){N — 2)(1 — 3p + px + py + p^)” 

from which we can calculate all third moments. 

In general if e is a decomposition of the product xyz ■ w into a monomials 
tti, , • • • , Ma , where order is disregarded (for example: xyz = {xz)y = (zx)y = 

y(zx) = y(xz) is a single decomposition with a = 2 , Mi = xz, = y), then the 
generating function becomes 

+ (Mi + 'W 2 + • • • + Ua — a)p)”*. 

16. Poisson balls and equal boxes. To reach a Poisson distribution we let 
TO -+ 00 and p ^ 0 so that mNp = tN", where t is the caierage number of balls 
per box in the Poisson distribution. 



RANDOM GROUP SIZE DISTRIBUTIONS 


535 


Since 


L 

bl 


Ein,) = N~e-‘ 
bl 


so that 


hfrn\ 

under these conditions, (3) becomes 

( 6 ) 

and from (5) it follows that the limit of ^(6, c) is 

(7) 

and hence 

(8) Variance (iib) = N ^ ~ 

(9) Covariance {ni,, n^) = —N 

Notice that these arc the moments of the numbers of objects of t3T5CS b, r, • ■ • , 
in a random sample of V from an infinite population where the fraction of 
b’s is t*' e~'/b!, just as it should be. 


iw-ri. .u 

FOibHo) = mN - 1) ,V-, + S(b, c)N~ c"', 

bid bl 


16. Fixed or binomial balls and varied boxes. We now consider the case 
where the chance of any ball entering the fth box is Pi . We shall again not 
restrict ourselves to the case = 1. 

The expectation of Xiq is immediately seen to be (1 + pi{x — 1)) = 
(1 — Pt + Ptx), so that the generating function is 

fix) = 2»(1 - p, + p^x)"^ 

and the expectation of rib is 

(10) EM = UI - PiT-^Pi = (f) 2.(1 - pd” (r~-J- 

Following Stevens [4] with a slight modification, let us set Pi = p(l + X,), 
where p is the average of the pi , so that S,Xi = 0. Then 

(1 - p.) = (1 - p(l + X,)) = (1 - p) ^1 - 

so that 

S.(l - p.)”‘-‘pj = (1 - p)""‘p‘2.- (l - 



536 


JOHN W. TUKBY 


Expanding the sununand, we find 


+ 


f(m — b)(m — b — l)p* (m — b)bp . b(b — 1) 

t ^ • 


-^1 A* + 0 (X?). 


2(1 - p)* 

Hence, setting SA* = A (notice this is not the same as Stevens’ A!), we have 

Eiru,) = (^)(1 - p)”^p‘ 

The expectation for all p, = p has been modified by multiplication by 
A / m — b (pim — 1) — 6)* b(m — 1) 




( 11 ) 


1 + 


2Ar \to - 6 - 1 (1 - p)* 


m — b — 1 


plus terms of higher order. For large N and consequently small p the quantity in 
braces is nearly 




and more roughly is approximately 6*. Similarly, the expectations of second 
moments are 


whence 

(12) Hb, c) = 


EM = Z (I - Pi - pr-^‘p)p) + 5(6, c)(^) B (1 - pd'^p) . 
(7)(’^)z. (i - P.)’^‘pJZ, (i - p,)”"'p/ 

Making the same sort of expansion yields 

(13) ♦«,, 0) (l - (l + ii) 

where terms in S\* have been neglected (note that 

52 X< = —A«), 

i 

and where 


{m 


m — b -- c AT — 2/- r »\-2 — 

(1 - 2p) - 


c - 1 iV 


m 


^( 1 -.)-■} 

• {p(m — 1) — b}* 



BANDOM GROUP SIZE DISTRIBUTIONS 


537 


f_^ 

\m — 


— b — c N — 2 _ x -2 

i, - c -1 Ann » - - 


•{p(rM - 1 ) - c}^ 


1 m — 6 — c \ N iV — 2 


2m — 6 — c— llA^— 1 iV” 


(1 - 2p)“^^ (6 -- c) 


m — 6 — c— 1\A^— 1 m — 6— 1 m — c— Ij 
This can be reduced to 

2ic(2p-i) + 0(p') + 0(^). 

and for p - 1/JV + 0(p*) + 

= 2p6c + O(p’). 

17. Poisson balls and varied boxes. To reach the Poisson limit, we let m — » « 
and p< —»0 so that mp,- = <,•. The generating function for first moments becomes 




and the expectation of Uh is 


/(x) = S,€' 




If we set ti = <(1 + A,), this becomes 


E{n,) = 


The summand expands in the form 


n»x!+...) 


= 1 + (6 - OX.- + 


- k X? + 


If ^ is chosen as the average of the U so that SX,- = 0, the sum becomes 
AT + xx; + (<A^‘ - + 1) 2X! + ... 

Again setting SX* = A we have 
(16) B{7u) « c-‘ (at + - O' - 



538 


JOHN W. TUKEY 


which can be written 


E{nb) K. N 






-»). 


The generating function for the second moments is 

S{x)M = 

so that the expectation of nhUc is 


(17) 

which becomes 




.he 


Eimnc) = j-r-j £ (1 + + XjYe + 5(6, c)E{tib), 

OlCl ifii 


whence wc can derive 

(18) 4>(6, c) ^ 1 ~ i - A (6 - 0(c - t). 

Thus 

(19) Variance {ni) £'(nt) ““ {E{nh)fy 

(20) Covariance {rihric) ^ ^ ^ ^ “ ^)(^ “ E{7ii^E{n^, 


18, Boxes in a systematic square. Another case which it may be worthwhile 
to write down arises when the boxes are systematically ‘^rotated” under “spouts” 
of different probability. That is, the number of balls m is a multiple of the 
number of boxes N, and the probability of the (?th ball entering the fth box 
depends on the value of g — f taken modulo N, An example for AT = 3 and 
m = 6 follows: 


Probabilities of entry 


Box 

Ball 1 

2 

3 

4 

5 

6 

1 

Po 

Pi 

Pi 

Po 

Pi 

Pi 

2 

P2 

Po 

Pi 

Pi 

Po 

Pi 

3 

Pi 

P2 

PO 

Pi 

Pi 

Po 


11 m — kN and the subscript r runs through 0, 1, 2, • • • , AT — 1, then the 
expectation of f(x) becomes 

SiSa£(x»J = Ar{nr(l - Pr + PrX)}*”. 



RANDOM GROUP SIZE DISTRIBUTIONS 


539 


Thus first moments, and by proceeding similarly higher moments, are available 
for this case also. 


PART III 


THE POISSON CASE 

19. The Poisson case with equal boxes. The Poisson case is obtained in the 
limit as m —► 00 and p 0 with pm = t. We wish to show that, in the limit, the 
number of balls in the different boxes are independent. Let ki, ko, • • • , ks be 
the number of balls in the first, second, • • • , Nth. box, respectively. Then the 
distribution of the fc’s is given by, where we write k = ki + k 2 + • • • + ky , 


m 


(fc) 


k\\k2\ * * * ky\ 


v\i - NvT-^ = 


(1 - NyY 




p—Nmp 


n, 


(7?ip) 

ki\ 


Now the first two fractions clearly approach unity in the limit, and the inde¬ 
pendence is proved. 

Since the number of balls in each box has an independent Poisson distribution, 
the distribution of the numbers of boxes each with exactly h balls is that of a 
random sample of N from an infinite population—namely it is a multivariate 
distribution with probabilities 

61 


REFERENCES 

[1] H. H. Newcombe, “Delayed phenotypic expression of spontaneous mutations in AVZ/er- 

ichia coli” Genetics^ Vol. 33 (1948), p. 447-476. 

[2] I. Opatowski, “Chain processes and their biophysical applications: Part I. General 

Theory,” Bulletin of Mathematical Biophysics, Vol. 7 (1945), p. 161-180. 

[3] V. Romanovsky, “Su due problem! di distribuzione casuale,” Giornale delVIstituto 

Italiano degli Attuari, Vol. 5 (1934), p. 196-218. 

[4] W. L. Stevens, “Significance of grouping,” Amials of Eugenics, Vol. 8 (1937-1938), 

p. 57-59. 



THE POWER OF THE CLASSICAL TESTS ASSOCIATED WITH THE 

NORMAL DISTRIBUTION 

By J. Wolpowitz 
Columbia University 

Summary. The present paper is concerned with the power function of the 
classical tests associated with the normal distribution. Proofs of Hsu, Simaika, 
and Wald are simplified in a general manner applicable to other tests involving 
the normal distribution. The set theoretic structure of several tests is charac¬ 
terized. A simple proof of the stringency of the classical test of a linear hypothesis 
is given. 

1. Introduction. The present paper is concerned with the optimum properties, 
from the power function viewpoint, of the classical tests associated with the 
normal distribution. In 1941 Hsu [2] proved the result stated in Section 2 below, 
which is concerned with the general linear hypothesis (in this connection his 
paper [1] of 1938 will be of interest). Also in 1941 Simaika [3] proved similar 
results for the tests based on the multiple correlation coefficient and Hotelling’s 
generalization of Student’s t. In 1942, Wald [4] gave a generalization of Hsu’s 
result. 

In the present paper wc give short and simple proofs of almost all these 
results, and a simple proof of the stringency property of the analysis of variance 
(Section 5). These proofs rest on theorems which characterize the set theoretic 
structure of the tests. Thus, while the proofs of Hsu, Simaika and Wald are 
rather elaborate and each problem is essentially attacked de novo, the methods 
of the present paper are in effect applicable to the classical tests based on the 
normal distribution. For these tests it will not be difficult to demonstrate the 
analogues of Theorems 1 and 3, and of the results of Hsu, Simaika, and Wald. 
In the present paper we first treat the general linear hypothesis, because it is the 
simplest problem, its solution is easiest to describe, and it admits Wald’s integra¬ 
tion theorem. Multivariate analogues of the latter are rather artificial and not as 
simple. We then discuss the problem of the multiple correlation coefficient, 
because it seems to be more difficult than that of Hotelling’s T and indeed, to 
include all the essential multivariate difficulties. Theorems 6 and 7 are the 
analogues of 1 and 3, respectively, while Theorem 9 describes the essential 
property of the power function which is of interest to us. In other multivariate 
problems one will prove the analogues of Theorems 6, 7 and 9. A generally 
inclusive formulation is no doubt possible. Theorems 5 and 9 are slightly more 
general than the theorems of Hsu and Simaika. 

Many of the statements below may be not valid on exceptional sets of measure 
zero. Usually this is so stated, but sometimes, for reasons of brevity or to avoid 
repetition, this qualification may be omitted. The reader will have no difficulty 
supplying it wherever necessary. 


540 



POWER OP THE CLASSICAL TESTS 


541 


The author is indebted to Erich L. Ijehmann of the University of California, 
who carefully read a first version of this paper. Theorem 4 below was arrived at 
independently by Professor Lehmann, with a somewhat different proof. 


2. The general linear hypothesis. In canonical form the general linear hypothe¬ 
sis may be stated as follows: The chance variables 

, • • • , Xk+l 

have at Xi, • • • , Xk+i, the density function 

(2.1) (V^ exp j^- - v>f + ^ = f(v, <r) 

with (T, 7/1, • • • , r?A; all unknown. 

Let 7j be the vector (t/i , • • • , lyjt). The null hypothesis Ho states that 


171 = • • • = Tfk = 0 

and is to be tested with constant size a < 1 (identically in o*). 

Let D be any admissible critical region for testing Ho. If A is any event let 
P{A I ? 7 , cr} denote the probability of A when 17 and a are the parameters of 
( 2 . 1 ). We have then 

P{D I 0, = a 

identically in <7, where 0 is the vector with k components all of which are zero. 
We now prove a property which characterizes all H. This theorem is due to 
Neyman and Pearson [12], and is given here only for completeness. 

Theorem 1. The fraction of the surface area of the sphere 

k+l 

E 2 

Xi = C 


which lies in D is a for almost all c. 

Proof. Let a be any positive integer, h a positive parameter, and a 
measurable function of y defined for y > 0 and such that 0 < \l/{y) < 1 . In view 
of the distribution of , it will be enough to prove that, if 

ha+l 

?STT)I mv-e-‘’dy.a 

identically for all positive h, that then 

^(y) = « for almost all y. 

Write 


( 2 . 2 ) 


1 


ar(a + 1) Jo 


f iiy)y' 

Jo 


a -hy 


dy =h 


,-(■•+ 1 ) 


Differentiating both members k times with respect to h and then setting h = 1 



542 


j. woLFowrrz 


we obtain the following result. The function 

SFSTT) 

is a density function with fcth moment 

M* = (ffl + l)(o + 2) • • • (a + /). 
The moments Hk are the moments of the density function 


1 

r(o + 1) 


a —1/ 

ye^. 


They satisfy the Carleman criterion [5, p. 19, Th 1.10], and hence no essentially 
different distribution can have these moments. This proves the desired result. 

Theorem 2 (Wald). Among all tests of the general linear hypothesis the analysis 
of variance test has the property that, for all positive d, the integral of its power on the 
surface rj^ = is a maximum. 

Proof. Let c be any positive number. We have only to show that if we allocate 
to the critical region D of the test the fraction ot of the surface area of the sphere 

fc+j 

(2.3) 2 a;< = c* 


for which 


C = 


tx\ 


Ar+1 


is as large as possible and that if we do this for all c, the desired maximum of the 
integral of the power will be achieved. If C is as large as possible so is 


Ex! 

1 

k+l 



Let Oi, • • • , ajfc-i-i be any point on the sphere (2.3). Let db be the differential of 
area on the surface •q = Then 


(2.4) f fin, <t) db = (V^ exp |- 


(c* + d^)\ 

2ff* / 



where z is the vector (oi, • • • , o*) and {rjYz is the scalar product of the two 
vectors. This last integral is easily seen to depend only upon | z | and to be 
monotonically increasing in | 2 | . This proves the theorem. 



POWER OP THE CLASSICAL TESTS 


543 


Corollary (Hsu). Among all tests of the general linear hypothesis whose power 
is a function of y only, the analysis of variance is the most powerful. 

3. The set theoretic structure of tests whose power is a function only of 
Wald’s result (Theorem 2) cannot always be extended, in its simple form, to 
tests involving the multivariate normal distribution, but this can be done with 
Hsu’s theorem (corollary to Theorem 2). In order to see what is involved we 
shall investigate the set theoretic structure of tests of the general linear hypothe¬ 
sis whose power is a function only of •n/a^. 

Let q{xi, ••• ,xit)he the set of points in the region D whose first k coordinates 
are Xi, •••,**. Let A{xi , • • • , x*, <r) be the integral of 

(wr-exp[-±{i;x^] 

with respect to Xk+i , • • • , x*+j, taken over q{xi, Xk). We first prove the 
following: 

Lemma. Suppose the power of D is a function only of y/a . Then for two points 



Xiy ••• jXk 

and 

Xi, , xl 

such that 


(3.1) 

i: X* = i: 

1 1 

we have 


(3.2) 

A{xi , • • • , OTib , (t) = A{xi , • - ,Xk,(T) 

identically in <r, with the exception of a set of measure zero. 

Proof. Suppose the statement is false. Then under some orthogonal trans¬ 
formation T of , • • • , Xk the region D would go over into a region D* with the 
following property: Let A*(a:i, • • • , Xk , o') have the same definition for the region 
D* as A(a:i, • • • yXk, (t) has for D. Then on a set of positive measure^ we would 
have 

(3.3) 

A(xi , ,Xk,(r) ^ A*{xi, ,Xk,(T). 


We shall now show that (3.3) results in a contradiction. We have 
(3.4) P{D h, <r} = P{D* I Tv, <r} 

identically in r). By the property of the region D, therefore, we have 

P{D\n,,r] =P{D\2'-\<r} 


1 The situation here is similar to that described in footnote 3. 



544 


J. WOLPOWITZ 


and hence 

(3.5) P{D U, <r} = PID* I 
identically in 17 . Thus we obtain 

(3.6) j Aixi, • • • , a:*, < 7 ) exp 1^- ~ (a:,- - ) 7 ,)*|j dxi • • • dxk 

= j (25rffO~“''”A*(a:i , • • • , x* , <r) exp (^< - 

with the integrations taking place over the entire space. Differentiating both 
members with respect to the components of rj and setting r/ = 0, we obtain that 
the two density functions (for fixed <r) 

(27r(T^)~^'‘'^^a~^A (xl , ■ • • , Xk, ( 7 ) exp ^ 

and 

(2iraY^'‘'^^a~^A*(xi , •. • , Xt, a) exp j^- a:<|J 

have identical moments. We shall now argue that these moments satisfy the 
conditions of Cram4r and Wold [7, Th. 2], so that the two density functions are 
essentially the same, in contradiction to (3.3). The Cram 6 r-Wold theorem states 
the following: Let Yt, • • • , Yk be k chance variables with a joint distribution 
function, and write 

x ,„ = i ; eyt . 

1-1 

Then the divergence of the series 

t 

n —1 

is sufficient to ensure that there exists essentially only one distribution which has 
these moments. We notice that the factor 1/a of course makes no difference. 
If we set i4(a:i, - j Xk , a) and A*{xi , • • • , a;* , o’) both identically unity and 

consider the resulting moments which enter into the X 2 n, we see that these 
moments satisfy the Cram^r-Wold condition. Now A and A* are <1. Thus, 
using the true value of A can serve only to increase the value of so that 

the series will diverge a fortiori. This proves the lemma. 

The following theorem helps to describe the set theoretic structure of tests 
whose power is a function only of X = 

Theorem 3. Let D be a test whose power is a function only of X. Let u be any 
positive number, and D{xi , • • • , Xk , u) be the fraction of the ^^area^* of the sphere 
S,Li Xk^j = occupied by points which are in D and whose first k coordinates 
arexi , • • • , Xk., If 



POWER OF THE CLASSICAL TESTS 


545 


(3.7) 23 S 

1 1 

then, except on a set of measure zero, 

(3.8) D(xi, ••' ,Xk,u) — D{x[, ••• , Xk, u). 

Proof. We shall show that, if the power of Z) is a function only of X, the 
failure of (3.7) to imply (3.8) would contradict the preceding lemma. Suppose 
then that (3.8) is not true on a set of positive measure. Under some orthogonal 
transformation on xi, ■ • ■ , Xk we obtain* a function D*(xi, ■ ■ ■ , Xk, u) which 
differs from D(xi, • • ■ , Xk, u) on a set of positive measure and such that, for 
almost every Xi, • • • , x* , 

A(xi,---,Xk,<r) = K r D(x,, ■■■ ,Xk, du 

Jo 

= K r D*(xi, • • • , T., «);'«'-*du 
Jo 

identically in < 7 , where iv is a suitable constant of no interest to us. Multiplying 
by (t\ differentiating repeatedly under the integral sign with respect to tr, and 
setting <r = 1, we obtain the result that the two density functions in n, 

KDjxi , - - , .n , n) (-„ 2)/2 
A{xi, ••• ,Xk,l) 

and 

KD*{Xl , ••• , Xky ii) (- w 2)/2 

1) 

are identical except perhaps on a set of measure zero. This contradiction proves 
the theorem. 

Theorem 4. A necessary and sufficient condition that the power of D be a function 
of X only, is that, with the usual exception of a set of measure zero, D{xi, • • • ^Xk ,u) 
be a function only of 

tx\ 

J_ 

u- 

The proof of this theorem is not essentially different from that of the preceding 
theorem, and we shall therefore sketch it only briefly. Let Z be a transformation 
on (xi, • • • , oja; , w) = (a;, u) which consists of a rotation of the vector x, followed 
by a multiplication of u and the components of x by a positive constant c. If 
D{xy u) is not a function of sf x\/u^ alone, then, just as before^ we can use some 

* See footnote 1. 

• This statement implies that a function of xi , • • • , Xk ,u, which is invariant to within 
sets of measure zero under all transformations Z (the exceptional set may depend on the 



546 


J. WOLPOWITZ 


transformation Z to give us a function D*{Xj u) such that 

-D(x, u) H D*(x, u) 
on a set of positive measure, while 

ED{Xy u) = ED*(Xy u) 

identically in i;, a. This yields a contradiction in the usual manner and proves 
the necessity of the condition. 

To prove suflBciency, write D{Xy u) = x\/i^) = v{v). Let y{Vy ri, <r) be the 

density function of v. Then 

P[D 1, 7 , <r} = / v(v)y{Vy rjy a) dv. 

Jo 

By hypothesis, v(v) is a function only of v. We know [9, p. 140, eq. 101] that 
7 ( 1 ;, <t) is a function only of v and X. Hence P{D | 77 , o-} is a function only of X. 

This completes the proof of the theorem. 

Theorem 5. Among all tests of the general linear hypothesis which have the 
properties described in the conclusions of Theorems 1 and 3, the classical analysis 
of variance test is the most powerful. 

We shall omit the proof of this theorem, which is very similar to that of the 
more difficult Theorem 9 below. 

Theorem 4 above shows that there exist regions D which satisfy the conclusions 
of Theorems 1 and 3 and such that P{Z) 1 17 , or) is not a function of X alone. It 
follows that the content of Theorem 5 is greater than that of Hsu’s theorem 
(Corollary to Theorem 2 ). 

It is instructive to note that Hsu’s theorem follows almost immediately from 
Theorem 4 and the form of y{Vy X). For let X be fixed but arbitrary. One verifies 
immediately from the form of y{Vy X) that 

y{vy X) 

y{vy 0 ) 

is, for fixed X, a monotonically increasing function of v. This, by Neyman’s 
lemma, immediately proves Hsu’s result. 

4. The multiple correlation coefficient. We shall now apply our methods to a 
multivariate test. For typographic ease we shall conduct the discussion for the 

'Xx * 

transformation), is a function of except on a set of measure zero. This statement 

would be completely trivial were it not for the exceptional sets; in any case it must be well 
known to set theorists. The author constructed an unnecessarily long proof of it, and 
believes that a more expeditious proof can be constructed using the ideas of [11, page 91, 
Theorem 11.1, and page 318, p. 7]. Professor C, M. Stein of the University of California 
has informed the author that this result is a special case of one established by himself and 
G. H. Hunt in a forthcoming paper. For these reasons the proof is omitted. (See also [13, 
page 27, Lemma 9.1].} 



POWER OF THE CLASSICAL TESTS 


547 


CRse of three variates, but the reader will observe that the procedure is really 
perfectly general. 

The chance variables {F.y}, f = 1, 2, 3, j = 1, • • • , n, have the density 
function 

(4.1) 9(S) - t h.,y,A 

L 1-1 i.i-i J 

where 1 ) 5 = { 6 ,j} is a positive definite (symmetric) 3X3 matri.\, 2 ) y,-,- is the 
value assumed by F,y. The null hypothesis Ho asserts that a given multiple 
correlation coefficient is zero, say that of Fi with Fj and Fj, i.e., 

(4.2) bii = 621 = bi3 — bzi = 0. 

The test is to be made on the level of significance a, i.e., if Bo is any matrix 
which satisfies (4.2), and if (? is a critical region for testing Ho , then 

(4.3) P{G \Bo] = a 

where the symbol in the left member means the probability of G according 
to g{Bo). 

Write 


n 

ns,i = 13 Vik Vik 

t-i 



Let M(cii, C) be the manifold in the 3n-space of 

2/11 , ’ ' * 1 Vik j * * ’ , yZn 

where su = Cn , iS = C. J’irst we prove the following: 

Theorem 6 . Any region G which satisfies (4.3) must have the property that the 
fraction of the area of M(cii , C) which lies in G is a, for any positive cn and any 
positive definite 2 X 2 matrix C = {c,,}. (We remind the reader that exceptional 
sets of measure zero are not precluded). 

Proof. Let i^(cu, C) be the fraction of the area of M (cu, C) in G. Recall 
equation (4.3) and the fact that Su , S 22 , S 23 , S 33 are sufficient statistics for the 
elements of Bo . On the manifold M (cn, C) the conditional density is uniform. 
Employing Wishart’s distribution [ 0 ] we conclude that 

(4.4) K' j ^(sn ,S)\Bo\N\S | 

• exp ^ —2 {^u®n “h ^*22822 "h 2623 S 28 ”t" fessSsslJ dsn ds 2-2 ds^o ds 33 = a 
where K' is a suitable constant which need not concern us. Here the symbol 



548 


J. WOLFOWITZ 


means identically in bn , 622,623, ^33, provided only that 611 > 0, 622 > 0 
622533 — 623* > 0 . Of course Sn is distributed independently of 522, S23, S33 • 
Proceeding as in section 2, we can, by differentiation with respect to the b% 
obtain all the moments of the 5*/s. Now let the 6’s take any admissible constant 
values. The moments of the Si/s are then seen to satisfy the criterion of Cramer 
and Wold [ 7 , Th. 2], and consequently essentially uniquely determine the 
distribution of the s.-,. The desired conclusion follows as before. 

The six parameters which uniquely determine the trivariate normal distribu¬ 
tion (of Yi, F2, P3) with zero means may be taken to be the following: 

1) The covariance matrix {era}, i,j = 2, 3 , of Y2 and Y3. 

2) The partial regression coefficients ^2,9 of Yi on F2 and Yz . These are 
defined as follows: Let E(Yi | F2 = 2/2 > ^^3 = Vz) denote the conditional expected 
value of Yi , given Y2 yz 9 Yz - yz • Then 


E{Yi I F2 = 2/2 > P3 = 2/3) ~ ^22/2 + ft 2/3 • 


3 ) The conditional variance ta of Fi, given F2 = 2/2 > P3 = 2/» • 

The population multiple correlation coeflScient 5 of Fi with F2 and F3 is then 
defined by 


RV 
(1 - R^) 


0’22 202 jSs 0’23 "f* 0I 0*33 • 


The six parameters above may be chosen arbitrarily, provided only that {o',;} 
is positive definite. R and w are, by definition, non-negative. 

Let yi be the column vector i/ti, • * • , 2/*n ; let yi be its transpose, and let y 
denote the point 2 / 11 , 2 / 12 , • • • , 2/in, 2 / 21 , • * • , yzn in 3n-space. Let z(y) = 
2 ( 2/1 > 2 / 2 , 2 / 3 ) be the component of 1/1 in the plane of 2/2 and t /3 ; let r = | z{y) 1 and 9 
the angle between z and yz , measured positively say in the direction of 7 / 3 . 
Finally let h be the absolute value of the vector yi — z{y\ 9 yz 9 yz)- 

We intend now to investigate the set theoretic structure of tests whose power 
is a function only of Rj and for this purpose prove the following: 

Theorem 7. Let H be a region whose power is a function only of R, Let 
F(6, r, 9, 822 9 Szz 9 Szz) be the fraction of the ^^volume^^ of the manifold on which 
hy r, 9y S 22 9 S 2 Z 9 $zz are fixed which is contained in H, With the usual exception of a 
set of measure zero, for fixed A, r, 522 , Szz , S 33 , the quantity V above is constant for 
all 9, 

Later, after this theorem is proved, we shall write V without exhibiting 9. 
This procedure is justified by Theorem 7. 

Proof. Suppose the theorem false, and proceed as in Theorem 3. A suitable^ 
rotation of the radius vector z(y) implies an orthogonal transformation T on the 
generic point y which leaves A, r, 522 , Szz , and 533 unaltered, and takes the region H 
into a region H* such that H and H* differ on a set of positive measure. T leaves 
R invariant, hence leaves invariant R which uniquely determines the distribution 


* See footnote 1. 



POWER OP THE CLASSICAL TESTS 


549 


of R. Hence an argument almost the same as that which led us to ( 3 . 5 ) yields the 
conclusion that the power of H and the power of H* are equal, identically in B. 
Proceeding as in Theorem 3, we obtain two essentially different density functions 
in hy r, 822 , S 23 , « 33 , whose integrals over the entire space are identical in the 
elements of B, From these functions we obtain two different density functions in 
j = 1» 2, 3), with identical moments (obtained by differentiation with 
respect to the elements of B), The rest of the proof is essentially no different 
from that of Theorem 3. 

Theorem 8 . In order that the 'power of H be a function of R alonCy it is necessary 
and sufficient thaty with the usual exception of a set of measure zero, V (/i, r, Sti , 823 , 833 ) 
be a function only of h/r (i.e., of R). 

The proof of this theorem is essentially the same as the proof of Theorem 4. 
The place of the transformation Z is taken by one which consists of any linear 
transformation on the vectors y 2 and yz , the addition of a constant angle to 6 
(rotation of z(y)), and multiplication of the vector yi by a positive scalar c. 
This transformation leaves R invariant. In the proof of sufficiency we use the 
distribution of R (see, for example, [10, p. 384, equation (15.55)]). The remainder 
of the proof is essentially the same as that of Theorem 4. 

Theorem 9. Among all tests H which have the properties described in the conclu¬ 
sions of Theorems 6 and 7, the classical test based on R is the most powerful. 

As a corollary to this theorem we have the following result due to Simaika 
[ 3 ]: Of all tests H whose power is a function of R only, the classical test based 
on R is the most powerful. 

Simaika^s result also follows easily from Theorem 8 and the density function 
of R in the same manner that Hsu^s result followed from Theorem 4 and the 
density function of v. 

In the course of the proof of Theorem 9, the various symbols Wy with or 
without subscripts, will denote suitable functions of the variables exhibited, 
and the various symbols k, with or without subscripts, \vill denote suitable 
constants. 

We have that 

= f exp ^ {yi — ( 182^2 + • 

•W^o(82»»*2»>Sm> - dyzn = (2ir«*)^ 

exp {y* + + 2j82/8»«2» + • 

• pro(822> S28, 8 ti, {o'*,-}) dyii • • • dyin. 
Now (fityt + is a function only of 182,, 822, 82a, Sas, r, and 6 . Also 



550 


J. WOLFOWITZ 


A* + r* « «u * yi. Thus 

P[H I jB} a= ^ y(fe, r, 522,528, 588)Tri(A, T, 522, 528, 588, [B]) 

• exp {>«'■ + jSa y») '^1 do dh dr dsn dsn dsu = J V(h,rstt, Sn, Ssj) 


( 46 ) ' S 8 s,{ 5 })( 4 Arr* exp 

• de dA* dr dsst dsa ds^ = j V WyX - r*, r, S22, S2», Ssa) 

' ^s(v^2/? — r*, r, Sjj, Saj, Sjj, {^}) (^*2/2 + fiiyt)'^ 


do dr^ dyX ds22 dsjj dss$. 


Integrating with respect to 0 and designating 

1^2 J exp (i 032 y2 + /Says)'^ 


do 


by W{y/y\ — r, 522 , 523 , 533, { 5 }) we observe that just as in ( 2 . 4 ), W is 
monotonicahy increasing in r (all other variables fixed). Thus we have 


( 4 . 7 ) 


P{H\B} = J FTF dr* dy* dsaa dsja dssa. 


In constructing if only the function V is at our disposal, and this subject to the 
limitations imposed by the conclusions of Theorems 6 and 7 and the fact that 
A* + r* = y* = Sii. The function TF is not within our control at all. With y?, 
S22, «23 ( Sa* fixed, TF is monotonically increasing with r. To maximize the power 
it is therefore best to distribute the “mass” so that F is as large as possible for 
large values of r and hence of R. This implies the classical test and proves the 
theorem. 


6. Stringency of the classical tests. Wald [8l calls a test Ti “most stringent” 
if the following is true: Let [T] be the totality of tests. Let 0 be the generic 
point in the parameter space, and P{T\0\ be the power of T at the point 0. 
Let T2 be any test other than Ti. Then 

sup [sup P\T |e) — P{Ti |d}] ^ sup [sup P{T Ifl} — P\Ti \ fi}]. 
a IT) a IT) 

Of course, we have omitted to specify the totality {T}. One can admit all tests 
whose size o, a given constant between 0 and 1, or restrict one’s self to tests 
whose size is exactly a. We shall do the latter. 

Under these circumstances we shall prove that the classical test of a linear 
hypothesis is most stringent. Our proof will occupy but a few lines, and is an easy 



POWEK OF THE CLASSICAL TESTS 


551 


consequence of the structure of the classical tests as described in the lemma of 
section 2. The result itself is a special case of an unpublished theorem due to 
G. H. Hunt and C. M. Stein, and all priority on this result is theirs. 

Return then to the notation of section 2. Let <r be fixed at any arbitrary 
positive value, and the surface 


be that one on"which 

oJiW =SupP{r|i;} - PlLilrj ] 

{T\ 

is a maximum, where Li is the classical test of the linear hypothesis. It is clear 
that this maximum is actually achieved, and that coi( 7 ;) is a constant on the 
surface ^ - c\, Let L 2 be any other test (of size a), and ^ 2 ( 77 ) be the corre¬ 
sponding function for L 2 . We have only to show that on the surface = cl 
we cannot have everywhere W 2 (n) < and our proof is complete. If everywhere 
on the surface = cl we had ^ 2 ( 17 ) < <wi(^ 7 ), we would have, also on the same 
surface, P { L2 \ rj ] > P{Li | 77 }. This would, however, violate Waldos Theorem 2 
(section 2 ) and proves the desired result. 

REFERENCES 

[1] P. L. Hsu, ^^Notes on Hotelling’s generalized T,” Annals of Math, Stat,y Vol. 9 (1938) 

p. 231. 

[2] P. L. Hsu, “Analysis of variance from the power function standpoint,” Biometrikay 

Vol. 32 (1941), p. 62. 

[31 J. B. SiMAiKA, “On an optimum property of two important statistical tests,” Bio- 
metrika, Vol. 32 (1941), p. 70, 

[41 A. Wald, “On the power function of the analysis of variance test,” Annals of Math. 
Stat.y Vol. 13 (1942), p. 434. 

[5] J. A. Shohat and J. D. Tamarkin, The Problem of Moments^ The American Mathe¬ 
matical Society, New York, 1943. 

[61 John Wishart, “The generalized product moment distribution, etc.,” Biometrikay 
Vol. 20A (1928), p. 32. 

[7] H. Cramer and H. Wold, “Some theorems on distribution functions,” Land. Math. 

Soc. Jour.y Vol. 11 (1936). 

[ 8 ] A. Wald, “Tests of statistical hypotheses concerning several parameters when the 

number of observations is large,” Am, Math. Soc. Trans.y Vol. 54 (1943), p. 426. 

[9] P. C. Tang, “The power function of the analysis of variance etc.,” Stat. Res. MemoirSy 

Vol. 2 (1938) (University of London), p. 126. 

[ 10 ] M. G. Kendall, The Advanced Theory of StatistieSy Vol. 1 , Charles Griffin and Com¬ 

pany, London, 1945. 

[11] S. Saks, Theory of the Integraly (Second Edition), G. E. Stechert and Company, New 

York, 1937. 

[12] J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of statistical 

hypotheses,” Roy. Soc, London Phil. Trans.y Ser. A, Vol. 231 (1933), pp. 289-337. 

[13] Ebbruard Hopf, ErgodentheoriCy Chelsea, New York, 1948. 



APPLICATION OF THE METHOD OF MIXTURES TO QUADRATIC 
FORMS IN NORMAL VARIATES 

By Herbert Robbins and E. J. G. Pitman 
Institute of Statistics, University of North Carolina 

1. Summary. The method of mixtures, explained in Section 2, is applied to 
derive the distribution functions of a positive quadratic form in normal variates 
and of the ratio of two independent forms of this type. 

2. The method of mixtures. If 

(1) F,{x), Frix), 
is any sequence of distribution functions, and if 

(2) Co, Cl, • • • 

is any sequence of constants such that 

(3) c, >0 0' = 0,1, •••), 2c,-=1 

(all summations will be from 0 to <» unless otherwise noted), then the function 

(4) Fix) = ScyE,(x) 
is called a mixture of the sequence (1). 

It is sometimes helpful to interpret Fix) in the following manner. Let J, Xo, 
Xi be variates such that J has the distribution P[/ = j] = cy (j = 0,1, • • •) 
and such that Xj has the distribution function F,{x). Let X be a variate such 
that the conditional distribution function of X given J = j is F,ix). Then the 
distribution function of X is 

P[X < a:] = SP[J = J]-PIX < a: 1 J = i] = ScyPy(a:) = Fix). 

This interpretation of Fix) will, however, not be involved in the present paper. 

The following statements are proved in [1]. If a: = ixi, • • • , x„) is a vector 
variable the function Fix) defined by (4) is a distribution function, and for 
any Borel set S, 

(5) f dFix) - Scy f dFiix). 

Js Ja 

More generally, if gix) is any Borel measurable function then 

/ OO A oo 

gix) dFix) = ICi I gix) dFiix) 

00 tL-oe 

whenever the left hand side of (6) exists. In particular, the characteristic function 

£62 



METHOD OF MIXTURES 


553 


<p(t) corresponding to F(x) is 


(7) <p{t) = Scyv^XO, 

where <pj(t) is the characteristic function corresponding to Fj{x), 

If each Fj(x) has a derivative fj(x) then F(x) has a derivative f{x) given by 

(8) fix) = ^csfiix), 


provided that this series converges uniformly in some interval including x. 
Conversely, if (8) is the relation between the frequency functions and if the 
series is uniformly convergent in every finite interval, then the relation between 
the distribution functions is given by (4). In practice we deduce (4) from (8), or, 
using the uniqueness theorem for characteristic functions, from (7). 

As regards computation, we observe that for any integers 0 < pi < jh and 
for any x it follows from (3) and (4) that 


0 < Fix) - £ CiFjix) = CiF,ix) + Z CiFjix) 

• _ A _Lt 


(9) 




( pi-l \ / pi—l p2 \ P2 

Z Cj) + sup{F,(a;)}-( 1 - Zc, - Zci) < 1 - Z cy. 

0 / i>pt \ 0 Pi / Pi 


The existence of these upper bounds (the last a uniform one) for the error term 
when the series (4) is replaced by a finite sum shows that series expansions of the 
mixture type (4) are especially well adapted to computational work. 

For some purposes it is useful to consider series expansions of the t 3 q)e (4) 
where the Cj may be of both signs and where the series Sc, may diverge. Both 
parts of (3) will, however, be satisfied in the cases considered here. 

If U, V are independent variates with respective distribution functions 
Fix), Gix) we shall denote the distribution function of any Borel measurable 
function HiU,V) by 


HiU, V) iFix), Gix)). 


Now if Fix), Gix) are both mixtures. 

Fix) = 'ZcjFjix), Gix) = SdkGkix), 

then by (5), 

P[HiU, V) <x]= Jj dFiu) dGiv) 

|«(u,v)^x) 


so that 


= 22c,- dk JJ dFjiu) dGkiv), 


(10) HiU, 7)(Sc,-F,(»), Sd* Gkix)) = SSc,- d* Hiu, t;)(F,(x), (?*(x)). 



554 


HERBERT ROBBINS AND E. J. O. PITMAN 


As an application of the principles set forth in this section we shall express 
as series of the mixture type (4) the distribution functions of any positive 
quadratic form in normal variates and of the ratio of any two independent forms 
of this type. Special cases of the problem have been dealt mth by Tang [2], 
Hsu [3], and many others, but the method of mixtures permits a unified and 
simple treatment of the general case. 


3 . Distribution of a positive quadratic form. We shall denote by (x) the 
chi-square distribution function with n > 0 degrees of freedom, 

(11) F.(x) = jT {x > 0), 

= 0 (x < 0) 

The corresponding characteristic function is 

(12) = Tc**' d/’«(x) = (1 - 2z<)“‘" = 

Jo 

where we have set id = (1 — We shall denote by Xn any variate with 

the distribution function (11). 

Let o be any constant such that o > 0. The characteristic function of the 
variate O’xl is 

(13) (1 - 2zo<)~*" = [a(l - 2 i() - (a- 1)]"*" = c~*"-tD^-^l “ '"• 

By the binomial theorem we have for any a > 0, 

(14) (|,|<|l-ip). 
where 


^n(^n -I- !)• • - (^n + J 




For o > 1 we see from (15) that all the Cy are non-negative. Likewise for o > ^ 
(and hence d fortiori for o > 1) we have | 1 — 1/a |~‘ > 1 so that (14) holds 
for all I z I < 1; setting z = 1 it follows that the sum of all the cy is equal to 1. 
Hence for o > 1, 


Cy >0 0’ = •• ■ ■)> 2cy = 1. 


Since | id | = | 1 — 2z< | ' < 1 for all real t it follows from (13) and (14) that 
for o > 1, 

(1 - 2lo0"‘" = ScytD*"'^^ = Scy(l - 2f0"*""^ 


( 16 ) 


® "ZCjipnMiit). 



METHOD OF MIXTUBES 


555 


Hence for o > 1 the distribution function Fn(x/a) of the variate a*xl is a mbcture 
of X* distribution functions, 

( 17 ) F^ix/a) - SCy Fn+Si(x), 

where the cy, determined by the identity (14), are the probabilities of a negative 
binomial distribution. 

It may, in fact be proved by a direct analysis, which we omit here, that (17) 
holds for any o > 0. However, if o < 1 then the Cy will be of alternating sign, 
and if o < i then the series Scy will diverge. This shows incidentally that a 
relation of the form (4) can hold even though the series Scy diverges and hence 
the corresponding relation (7) does not hold for t = 0. 

Theorem 1. Let 

X = 0(xm + Ol Xmi + • * • + Or Xm,)> 

where the chi-square variates are independent and o, Ci, • • • ,a,are positive constants 
such that 

o< > 1 (i = 1, • • • , r). 

Define constants c/hy the identitifi 

( 18 ) ” (^ ■ I)"!*”}" 

dim obviously 

Cy ^0 0 = 0,1, • • •)> 2cy = 1. 

Let 

M = TO + TOi+ ••• + OT,; 


then for every x , 

(19) P\X < x] = " ZcrFM + zj { x / a ). 

For any integers 0 < pi < pt and every x, 

0 < P[X < x] - 2cyPv+2y(x/o) 

Pi 

(20) < F«(x/o)*{ 2 F*+i„+!(x/a)-(l - 2^/ -Scy) 

<l-f«y. 

PI 

Proof. The characteristic function of X/a is, by (13) and (18), 

•pit) = W*"- - (l - “ 2:CytD‘^+^ = LCiVu+vit) 


»If r iv 0 we regard the left hand side of (18) as having the value 1. 



556 


HERBERT ROBBINS AND E. 3. O. PITMAN 


Hence for any y, 

P[X/a <y] = Scj 

whence (19) follows on setting x ay. Finally, since F(x) is a decreasing function 
of n for fixed x, (20) follows from (9). 

It should be observed that the coefficients cj determined by (18) can be 
written explicitly as the multiple Cauchy products 

where 


_ + 1) • • • (inii + y - 1) 




0 -i)' 


(t = 1, • • • , r; j = 0, 1, • • •)• 


The Cj may be computed stepwise by the relations 

Cy” = Ci.i. 

= t {cjiT” • c..j} 
<-0 

Cf = Cy. 


( s = 2, • • ■ , r), 


4 . Distribution of a ratio. The ratio xm/xn of two independent chi-square 
variates has the distribution function 


== 0 (x < 0), 

In computational work we can use the tables of the Beta distribution function 


together with the identity 


’(r).r(«), 

0 (x < 0), 1 (x > 1), 

Fm.„(x) = Ix/a+»)(im, jn). 


Theorem 2. Let 

( 22 ) 

where the x variatee are independent and a, ai, • • • , Or, &i, ••• fbt are poeUive 


Y a-(Xm + OlX^t + ••• + OfXm,) 

x» + i>ix*». + ---+6.x*«, ’ 



METHOD OF MIXTURES 


657 


comtanta such that 


o< > 1, 6y > 1 


(t = 1, • • • , r;; = 1, • • • , 8). 


Define constants Cj ,dkhy the identities 




= 


= 


(1*1 <1) 


Cj ^ 0, Scy =1, dk ^ 0, Sd* = 1. 


ikf = m + mi+-*‘+mr, A^ = n + ni+**«+n, ; 
then for every x, 

P[X < a:] = 2 Scydifc-Fji/ 4 . 2 y.iv+ 2 jfe(x/a), 
and for any integers 0 < pi < , 0 < qi < and every x, 

0 < p[x < z] - Si: Cj dk * FM-^2j,ff-\-2k{x/a) 


< 0-£.)•(.-14 


Proof. Let U, V denote respectively numerator and denominator of (22). 
From Theorem 1, 

P[U < x] = Scy Fu+ijix/a), 

P[7<x] - 2d*PWx). 

Hence by (10), for every x, 

PIX < x] - PIU/V < X] = 'S'Lcidk’FM^ti.sMa). 

The rest of the theorem is obvious. 

Corollary. Let 


axl + bxV 


vhere the x variates are independent and 


0 < a <h. 



558 


HERBERT ROBBINS AND E. J. G. PITMAN 


Define 

a = a/h, N “ r + 8, 

cy = «*•. . d _ ay 

then 


O' = 0,1, •••); 


Cj >0 0 = 0» 1» • •')» 2cy = 1, 

and for every x, 

P[Z < a:] = 2cy Fu,n+ii{ax). 

For any integers 0 < pi < pa and every x, 

0 < p[X > a:] - Cj[l - Fu .N+ifiax)] 


(23) 


< [1 - PK.w(ax)] 


^ X c}j + [1 - Pji/.jv+2p,+2(ai;)] 

•(l - 'f - f C,) < 1 - f 

\ 0 PI / PI 


Proof. Except for (23) this is a special case of Theorem 2. To prove (23) 
we observe that 


P[Z > x] = 1 - P[X < x] = 2Cy[l - Pj^.y+ 2 y(ax)], 

aaid since for fixed m and x, Fm,n{x) is an increasing function of n, (23) follows 
in the same way as (9). 

5. The non>central case. Let Y be normal (0,1) and let X = (7 + dy, where d 
is any constant. The frequency function of X is, for x > 0, 

/(x) = (2xx)“*-e~*''''‘"*’-(e''*‘ + e~^')/2. 

By expanding the last factor into a power series it is easily seen that 

(24) /(x) = 2py/i+2y(a:), 

where /„(x) = F'n{x) is the chi-square frequency function with n degrees of 
freedom and where 

Vi = e-^^-Qidy/j\ 0 = 0 , 1 ,...). 

Since the identity 

(25) ^ 2py y (all z) 

holds, it follows that 

Pi > 0 0 = 0 , 1 , ...), 2 py = 1 . 



METHOD OF MIXTURES 


559 


The series (24) is uniformly convergent in every finite interval, so that we 
can write the distribution function F(x) and characteristic function <p{t) of X 
in the forms 


F{x) = 

^(<) = 

where again we have set u) = (1 — 2it)~^. 

Now let Ti, • • • , F« be independent and normal (0,1) variates and let 

(26) X = (7, + di)* + ... + (r„ + dn)*, 

where the d,- are constants such that 

di + • • • + d» = d*. 


The characteristic function of X is then 

and hence the distribution function F(x) of X is again a mixture of x* distribution 
functions, 

(27) F(x) = 2 pyF„+ 2 y(a:), 

where the py, determined by the identity (25), are the probabilities of a Poisson 
distribution with parameter X = ^d*. We shall denote the non-central chi-square 
variate (26) by xn.d • 

We can now generalize Theorems 1 and 2 in a straightforward manner to 
cover non-central chi-square variates. We shall state only the generalization 
of the Corollary of Theorem 2 to the case in which the numerator is non-central. 


Theorem 3. Let 


X 


fi 

XM,d 


axl + 5x 


2 > 


where the x variates are independent and 

0 < a < b. 


Define 


Ck 


X = a = a/6, iV = r + 5 

Py = e"^-XVi! 

^ is(^8 + 1) • • • (^S + fc — 1) Q _ 

A'! 


0 = 0,1,...), 
(k = 0,1, ...); 


then 


Pi > 0, Spy = 1, 


Ck > 0, Sc* * 1, 



560 


HERBERT ROBBINS AND E. J. Q. PITMAN 


and for every x, 


P[X < a:] = SSpy Ck F 
For any integers 0 < g\ < < hi < ht, 

0 < P[Z < a:] - £ i: ViCk • F«+,s.sMax) < (l - £?,) ■ (l - tcX 

fll *1 \ ffl / \ / 

REFERENCES 

[1] Herbert Robbins, “Mixture of distributions,” Annals of Math. Siaiistics, Vol. 19 

(1948), p. 360. 

[2] P. C. Tang, “The power function of the analysis of variance tests wdth tables and 

illustrations of their use,” Slat. Res. Mem.y Vol. 2 (1938), p. 126. 

[3] P. L. Hsu, “Contributions to the theory of ‘Student’s* t-test as applied to the problem 

of two samples,” ibid., p. 1. 



THE JOINT DISTRIBUTION OF SERIAL CORRELATION 
COEFFICIENTS 

By M. H. Quenouelle 
Roihamsted Experimental Station 

1. Summary. An expression for the joint distribution of serial correlation 
coefficients, circularly defined, has been derived. It has been shown that this 
distribution possesses properties similar to those already encountered in the 
distribution of a single serial correlation coefficient, i.e. it is definedby different 
function forms for various subregions. The distribution thus found is of little 
use for computational purposes. Consequently, approximate forms have been 
investigated and the suitability of the ordinary partial correlation coefficient 
for large-sample testing has been inferred. 

2. Introduction. Anderson [1] has derived the distribution of the serial 
correlation coefficient 



where the are normally and independently distributed with mean n and 
variance and w^here a circular definition is employed, so that Cn+t is defined 
to be equal to u . However, in making a test of any series, we shall usually be 
faced with a set of serial correlation coefficients, so that we shall require a joint 
distribution function of n , r 2 , • • • , rm say. This distribution function is derived 
below by an extension of the method used by Koopmans [2]. 

It should be noted that Bartlett [3] has shown that for large samples the 
variances and covariances of the ri are independent of the distribution of c* 
imder fairly wide conditions. This means that the joint distribution function 
obtained for normal €* will often give a good approximation for non-normal €» 
and can be used as the basis for any test of the correlogram. 

3. Conditions on the ri. It is easily seen that the ri camiot take all values 
from +1 to —1 independently. For example, r 2 cannot take a value near — 1 
if ri takes a value near +1. As a result, there will be certain necessary conditions 
that the ri will have to fulfil. It is not difficult to find these conditions, since, if 
yi(i = 1,2, • • • , n) are any set of variables, then 

( 1 ) 2 (€»•+; Vif = yi yi+j > 

where €» may or may not be corrected for the mean and the double-suffix sum¬ 
mation convention is employed. 


561 



562 


M. H. QUENOtJILLB 


Thus, provided 0 < m < n/2, we will have 




1 

ri 

ra 




ri 

1 

fi 

rm-l 

(2) 


ra 

n 

1 • • 

r«-» 



Ttn 

r^-i 

fm-a • • • 

1 



as a necessary condition that the right-hand side of (1) be positive definite and 
this expression will impose necessary conditions upon the joint distribution 
of the Ti. 

Fig. 1 gives the limits of possible values of n and r 2 subject to (a) no restriction, 
(b) n = 0, (c) ra = r 4 = 0. 


4. Complex Integration in m Variables. Before finding the joint distribution 
function of the r,- some introductory remarks on complex integration involving m 
variables will be necessary. 







SERIAL CORRELATION COEFFICIENTS 


563 


We can evaluate an integral such as 

i.00 ./( gi.gji •••«>») 

L 'J fife-o,) 

j-l 

whore ^(a<) = 0 and f(zi , 22 , • • • , 2 m) is regular in the region ^(si) ^ 0, by 
successive Cauchy integrations, so the integral has a value (2«)’7(ai, • • • , Om). 
In the same manner as for Cauchy integration, it will be possible to distort the 
contours over which we integrate so that we can evaluate 

. . /(gi • • • 8m) 

L"'/n(g, 

i-i 


provided that/( 2 i, • • • , 2 m) is regular in the region defined by S, and (oi, 
is enclosed in this region. 

More generally, if we have an integral of the form 



dzi--- dz„, 


,am) 


and we make the transformations wj == a^Zi and 6y = a,-,c,, i.e. W — AZ, 
C = A~^B, it is possible, in the above manner, to evaluate the integral as 


(3) 


± 


(27ri)" 

\A\ 


fici • • • cj. 


Suppose we now consider the integral 



dzi • * • dZni , 


where n > m. We may select a set, gt, ot m equations o,; 2 , = bj , and let 
Ah = [o.y], Bk = [bj], Ck = Ak^Bk = [caj. Then, we may carry out the integration 
as previously, in this case, s umming a series of terms for various combinations 
of m equations out of the possible n. The value of the integral may then be 
written 


(4) 


(2irt')’* ± 


fjcik , • • • Cnk) 

^k\ II {dtiCik—bj)' 

Ir^k 


where the summation occurs over the points (ca, Ou, • • • , Cnt) lying in the 
region defined by S, and the product term excludes the set of equations gk. The 
ambiguity of sign in (3) and (4) arises from the Jacobian | | and the sign 

must be chosen which makes the transformation of d 2 i, • • • , dzm yield a positive 



564 


H. H. QUBNOtniiLE 


element. It must be noted that it is possible to obtain several expansions of the 
form (4) according to the convention that is employed in defining “enclosure” 
for each of the variables. 

6 . Integral form for the joint distribution function. We can, without loss of 
generality, assume (r* » 1. Suppose that 

p -1- (g /n, „ = t ...... - (z: 

where ci, ct are independent, so that rj = qi/p. Then by a consideration 

of n dimensional space, we can see that p is distributed independently of ri, • • • r« 
so that their joint distribution can be written g(p)h{ri , • • • , r„)dp dri, , dr*. 
The joint distribution of p and 9i, • • • , g* can thus be written 


(5) /(pgi •••?«) dp dgi • • • dg* ^ 
where it is not difficult to see that 

(6) g(p) = 


_ g(p) 




dp dgi • • • dg«. 




We can now find the joint distribution of p and gi, • • • , g* by inverting the 
characteristic function of these variables. This is given by 


1 r* f "* 2^^ *1 

JL* ■ ■ ■ i ^ 


=—r 

= 1/1A I*, 


J exp —d«i • • • d*n, 


where 


«' = I«i, ft. • • • » <»I 


IA I = IJ (1 — 2fij — 

j-i 

= (1 — 2 * 1 ;)""^ n (1 — 

so that the joint distribution of p and gi, *' ■ , g* 


Kjl = COS 


2iej 

1 - 2tij ’ 


/(p, gi- • .gj = /, ■ ■ ■ / ^ } dij dtfi • • • d9* 

_/ (1 - 2zij)*ygy\ dKi • • • diCm . 


exp< - 



SERIAL CORRELATION COEFFICIENTS 


5G5 


where S, is the region bounded by Ky = ± ^ replaced 

by region S enclosing the same set of singularities on the real hyperplane, and S 
can be chosen independent of n. Thus it will be possible to reverse the order of 

integration in (7) provided that / | 1 — 2iri converges, i.e. provided 

•'-00 


n 


> 2m + 3. Then since 




(p - Kygy)J<“-**-« 

_ 2m - A (“5(p - Kjq,)] for p>Kjqj, 


Qi(n--2m—l)y, / W 


‘■r(- 




= 0 for p < Ky^y, 

we get 

/(p, 9l • • • find = 


p—ip 


( 8 ) 




f f j I 

•/••*/ r n~l ~ -|§ aiCl • • • ClKm , 

* n (1 — «y«yi) 

_ »-i 


where iS encloses the same singularities as S,, all of which lie in the region 
P ^ If) • If we now use (5) and (6) w'e get 



In a similar manner, it is possible to derive forn > 2m + 3 the joint distribution 
of serial correlation coefficients, f^, • • • , f„, uncorrected for the mean, in 
the form 


(10) h(fi • • • f„) = 



6. TCy tena i nn for variables in an autoregressive scheme. Madow [4] has shown 
how to extend the distribution of the serial correlation coefficient for uncorrelated 
variables to the case when the variables Xi are connected by a linear Markoff 
scheme, Xi == paji-i + «< with a normal distribution of the error «y. It is worth 



566 


M. H. QUENOUILLE 


noting that the method used by Madow can be applied to derive the joint 
distribution of serial correlations of variables ®,-, which are connected by a linear 
autoregressive scheme of order m, or less, 


OaXi + OiXt-i + • • • + OmXi-m = «< , 

where *i, • • • , e„ are normally and independently distributed, and tn+i = e< .* 
Under these conditions, the expression (9) will be modified by a factor 


( 11 ) 


where 



1 

(A + 2JS^r,■)»(»-« ’ 


A 


m 


2 

k-0 


m—i 

k-0 


while (10) will be modified by a similar factor with n replacing n — 1. 


7. Reduction of the distribution function integral. Using the method described 
in section 4, it is now possible to reduce the integral given in (9), if we observe 


d/ci • • • dKn 


that Kji = 

Kjn-i and assume n odd. We then have 


h(rv • 

^ \ _ 


- ^ ) 

i 

(1 

* ^m) — 

r 

— 2m — 

i\ (2«rj 


I 

u 



^ 2 

V 


(12) 


r/n-2\ 


1 / 

J(n 



2 ) 

_ "r 

r Kk 



r 1 

(n — 2m — 

1^ « 

1 

/ 


A 1 

1 2 

/ n 

if^gk 

’'yi 

Kk 


where / = (1, 1 , • • • , 1 ), r' = (n , rz, • • • , Vm), = (kji , • • • , k^i) and 
Kk is the matrix formed from a set gk of the m matrices arranged in order. 
The factors in the summation can most easily be determined if we put 

^ ^ cx: 2 l(ri, .. • , fm-i) — Tm and sum over the region for which fm < 

r Kk 

A(ri, • • • , rm~i). To demonstrate the manner in which formula (12) works, we 
shall consider m = 2. From formula (2) we can see that a limit to the possible 
values that can take is given by r 2 = 2r? — 1 i.e. by the curve (cos cos 20) 


^ This is a sufficient condition for Xn^^i *■ Xi • 



8ERUL CORRELATION COEFFICIENTS 


507 


in the (n , r*) plane. It is not difficult to see that there are "C 2 possible terms in 
( 12 ) and that each of these terms is proportional to the i(n — 2 m — 3 )th power 
of the distance from a line in the (n, r*) planes. These lines are the joins of the 
points (cos 2 irtyn, cos iri/n), i = ^(n — 1 ) and the joins of such points 

on the curve (cos 6 , cos 26) give the outer limits of the possible values of n and f 2 . 
It can also be seen that these points correspond to the equations K,Kji — 1 (each 
of these equations determines a plane in 4-dimensional complex space), while 
the joins of these points correspond to the singularities defined by and terms 
arising from pairs of these equations. Furthermore, since the sum of residues in 
any plane is zero, the sum of contributions, taken with appropriate signs, arising 
from lines through any of these points is zero, i.e. the sum of all possible terms 
involving any particular Kji will disappear. This leads to several possible 
expansions for h(ri , • • • , rm). 

If we consider the particular case n = 9, then each term in the expansion ( 12 ) 
is proportional to the distance from one of the lines joining (cos 2«79 cos 4irz/9), 
i = 1, 2, 3, 4. These lines may be denoted by h ,. Then the contribution from 
lij is given by 

g _ KliKl j (k h -f- Klj)ri -j- 2(^2 ~b 1 ) _ 

(ki/ ~ XiOCku — «l*)(xi< — Xi,)(»Ciy — Ku)(»(|y — Ku) ’ 

where j > i and ku = cos . 

? 

The values of this expression are: 

i, 2 , - 1.979 + 2.938 n - 1.563 r 2 , 

Zi 3 , 0.926 - 2.106 n 4- 3.959 n , 

lu , 1.053 - 0.832 ri - 2.396 n , 

Z 23 , - 5.012 - 3.959 ri - 6.065 n , 

Z 24 , 3.033 -1- 6.897 n + 4.502 rj, 

lu , - 4.086 - 6.065 n - 2.106 rj, 

where, for example, the contribution from Z 12 acts in the region for which 
1.563 Ti < — 1.979 •+• 2.938 n . Fig. 2 demonstrates the configuration for this 
case. It is seen that the frequency surface is a tetrahedron. As particular ex¬ 
amples of the identities mentioned above we have 

Z 12 + hs + hi = 0, 

—Z 12 + Z 23 4" Z 24 = 0, 

—li3 — hi -h Z 34 0 . 

For a general value of m, we shall find that the hyperplanes joining sets of m 
points (cos 2«7n, cos 4in7», • • • , cos 2 rmi/n) will be singularities on the 



568 


M. H. QtTENOUILLB 


frequency hypersurface. The hsnperplanes passing through sets of m successive 
points will give the limits of possible values of n, ••• , r„. Furthermore, the 
sum of contributions (with appropriate signs) to the frequency function from 
the set of ^(n — 2m + 1) hyperplanes passing through any point will be zero. 

8. Integral approximation for the distribution function. The expression (12) 
is, of course, difficult to use in practice and we require an approximation similar 
to that of Koopmans. For this we make use of the integral expression (10) 


ra 



for the joint distribution function of n , • • • , f„ and approximate to the factor 
(1 ~ KytyOj • This can be done without undue difficulty, but the resulting 


multiple integral does not appear to be capable of easy reduction. This is hardly 
surprising, since from the nature of the distribution of the r,- we should expect 
this approximation to involve Rm raised to a suitable power, and this conjecture 
is strengthened by the following considerations: 

a) The distribution of fi may be obtained by considering the two sets of 
observations xi,xt, ■ • ■ , Xn-i , Xn and xj, x*, • • • , x„ , xi as unrelated, and using 



SERIAL CORRELATION COEFFICIENrS 


569 


the distribution of the ordinary correlation coefficient corresponding to n + 3 
pairs of observations. (Dixon [6] Quenouille [7]). In the same manner, the m sets 
of observations Xi, Xi, • • • , x„_i, a;„; xj, a;j, • • • , x„ ; • • • Xm , x„+i , ■■■ , 
Xm-i, Xm-i, might be considered as unrelated and the joint distribution of their 
correlations, given by Garding (5), will involve raised to a suitable power. 

b) The outer limits for the joint distribution of ri, r 2 , • • • , or n , fj, • • • , f„ 
for large n, mil be provided by the equations Rp — 0, {p = 1, • • • , m). An 
investigation of the properties of the functions, Ri ,Ri, ,Rm might therefore 
be expected to throw light upon the joint distribution of n, rj, • • • , r™ or 

*** f 

c) Rp is a quadratic in tp and may be put equal to Rp-i{r'p — rp){rp — r'p), 
where r'p and r"p are functions of ri, rj, • • •, rp_i giving the limits of the values 
that fp can take for any particular values of n , • • • , Vp-i . Let Qp — Rp/Rp.i , 
then Qp is likewise a quadratic in Vp , taking all values between r'p and r, and 



B(s + 1, h) ( r'p - 

Qu A 2 ; • 


But, by expanding Rp as a bordered determinant, it is not difficult to show that 
r'p — r'p - 2Qp_i, so that 



r(s + 1) j 
r(s + 1)”' 


Q 


»+i 

p—1 • 


In particular, if 


(13) 


/(n- • -rj 


r(^n + !)• • •r(^n - m + 2) _ J_ 
r(iw + I) • • ~ + I) ’ 


and if we integrate with respect to r^ , Vm-x , • • • r. in turn, we get 


/, • • • • • • r„) 


dr„ 


dr-t 


+ 1 ) 


which is the approximate distribution of the first serial correlation coefficient, 
uncorrected for the mean, as given by Dixon [6]. 

The importance of this lies in the fact that the integral corresponding to that 
of Koopman’s for the joint distribution is 


r(^) 

r(iw — m) 


f .. f 

II” Ja ' J 


1 1 

Jn—m—1 

r X 



Y |‘» 


n 


sin \nxi 


0 


dxi 


*(a:.) 


I 

X 


dxi • • • dxm 



570 


M. H. QUENOUILLE 


where f - [n, ••• , r*], 


X = 


Y = 


cos Xi cos Xj • • • cos Xm 

cos 2 xi cos 2x2 • • • cos 2xm 

L cos mxi cos mx 2 • • • cos mxm 

[1,1, 1], 

1 1 

cos Xi cos X2 


1 

cos Xm 


1_ cos (m — l)xi cos (m — l)x2 
= [cos 0 , cos 20 , • • • , cos m 0 ]. 


cos (m — l)Xm J 


and S is the region given by 


1 I 
rX 


> 0. This suggests, by analogy, that the 


joint distribution function is a polynomial in Vn of degree 2{\n — m — 1 ) + 3 = 
n — 2m + 1 which vanishes only when = 0. The equation satisfies these 
conditions, and in addition, it reduces to the known form when m — \ and can 
be integrated to give this same form. Thus there is a strong suggestion that (13) 
gives an approximate distribution of n , r 2 , • • • , rm , uncorrected for the mean. 

An alternative form for the constant factor in (13) may be obtained if w’e 
note that 


r(4n — m + 2) 


1 r(n — 2 to + 3) 


r(in - m + DttI 2’-*"+* [r(in - m + |)1*' 

d) Now r'p and r, can be written in the forms (Sp_i + /?p_i)/i?p_2 and 
(5;,_i — Rp-i)/Rp- 2 , where 


Thus 




n 

n 

r3 

... 0 


_!)>-• 

1 

n 

r2 

• * • rp-1 

.1 = ( 

ri 

1 

n 

• • • rp-2 



rp-2 

rp-3 

rp-i 

ri 

Rp-2 ( 

^Sp-i + Rp^i 

-A 

(r - 

/Sp-i — R 

^ Rp-2 

7 

\rp 

Rp-2 

RU 1 


p Rp^2 


A1 


Rp-2\ 

- V 

R> 


/J 





where 


Qp — Qp-l(l — 

^l,p+1.23... = Tp^i/Rp^l , 






SEBIAL COBRKLATION COEFFICIENTS 


571 


and 

Tp n n • • • rp_i 

rp-i 1 n • • • rp_2 

T p—l = Tp—i Ti 1 • • • I’p—3 


I ^’l Tp-2 Tp-i • • • 1 

Therefore, if we make a change of variable to ri,p_i,23..., ri.p.23..., • • • ^3.2, ri, 
we find that the new variables which correspond exactly to partial correlation 
coefficients are, in fact, independently distributed as such, with 3 degrees of 
freedom more than in the case where the sets of variables are distinct observa¬ 
tions. 

While the above properties do not prove that the r,- or f,- may be tested 
using partial or multiple correlation coefficients, this conjecture has been verified 
elsewhere and it has been shown [8] that, with certain adjustments, a test can be 
derived which is applicable to fairly short series. 

REFERENCES 

[1 ] R. L. Anderson, Distribution of the serial correlation coefficient,^' Annals of Math. 

StaL, Vol. 13 (1942), pp. 1-13. 

[2 ] T. Koopmans, ‘‘Serial correlation and quadratic forms in normal variables," Annals of 

Math. Stat.y Vol. 13 (1943), pp. 14-33. 

[3] M. S. Bartlett, “On the theoretical specification of sampling properties of autocorre* 

lated time series," Roy. Slat. Soc. SuppL, Vol. 8 U946), pp. 27-41. 

[4] W. G. Madow, “Note on the distribution of the serial correlation coefficient," Annals 

of Math. Stat., Vol. 16 (1945), pp. 308-310. 

[5] L. Gardino, Proceedings of Lund University Mathematical Seminars^ Vol. 5, pp. 185-202. 

[6] W. J. Dixon, “Further contributions to the problem of serial correlation," Annals of 

Math. Stat., Vol. 15 (1944), pp. 119-144. 

[7] M. H. Quenouille, “Some results in the testing of the serial correlation coefficient," 

Biometrika, Vol. 35 (1948), pp. 261-7. 

[8 ] M. H. Quenouille, “ Appro.ximate tests of correlation in time series 1," Roy. Stat. Soc. 

Suppl., Vol. 11 (1949). 




ON THE ESTIMATION OF THE NUMBER OF CLASSES IN A 

POPULATION! 

By Leo A. Goodman 
Princeton University 


1. Summaiy. This paper deals with the following problem: Suppose a popula¬ 
tion of known size N is subdivided into an unknown number of mutually exclusive 
classes. It is assumed that the class in which an element is contained may be 
determined, but that the classes are not ordered. Let us draw a random sample 
of n elements without replacement from the population. The problem is to 
estimate the total number K of classes which subdivide the population on the 
basis of the sample results and our knowledge of the population size. 

There is exactly one real valued statistic S which is an unbiased estimate of K 
when the sample size n is not less than the maximum number q of elements 
contained in any class. The restriction placed upon q is unimportant for many 
practical problems where either there is a reasonably low bound for ^ or those 
classes containing more than n elements are known. An imbiased estimate does 
not exist when there is no such loiowledge. 

Since the unbiased estimate can be very unreasonable, modifications of S are 
considered. The statistic 



N{n - 1) 
n(n ~ 1) 


X2, 


US' 

»-l 

if S' < Ea:.-, 
1-1 


where Xt is the number of classes containing i elements in the sample, 
is the most suitable estimate, in comparison with three other statistics, for a 
hypothetical population. 

The case where each element in the population has an equal and independent 
chance of coming into the sample is used as a model for some sampling procedures 
and also as an approximation to the case of random sampling. 


2. Introduction. The problem discussed may be described in terms of colored 
balls in an urn. How should we estimate the number of colors present in the urn 
on the basis of both the sample which gives the number of, say, white balls, red 
balls, etc., and our knowledge of the total number of balls in the um: 

The following practical cases illustrate some of the ways in which this problem 
presents itself: 

(1) A company has received a large number of requests for a free sample of 
its product. It is known that the same people often send more than one request. 

* Prepared in connection with research sponsored by the Office of Naval Research. 

572 



ESTIMATION OF NUMBER OP CLASSES 


573 


From a sample of the requests we wish to estimate how many different people 
have sent requests.^ 

(2) The Social Security Board possesses a large collection of Social Security 
cards. It is known that some people obtain different cards when they change 
jobs. From a sample of the cards it is desired to estimate how many different 
people have Social Security cards.* 

(3) A person who sells durable commodities anticipates opening a store 
which is to be located at a highway intersection. He would like to know how 
many different automobiles pass through the intersection in a given time period. 
The total number of automobiles may be easily observed but some probably 
pass through more than once. This type of inquiry is also useful to advertising 
agencies which must decide the most efficient location for billboards. 

(4) The State Unemployment Compensation Board possesses a large list 
of the people receiving unemployment benefits. It is desired to estimate the 
total number of families benefiting from the insurance program on the basis of a 
random sample of the people named on the list. 

(5) The number of words in a book may be easily estimated and a sample can 
be taken. The problem of estimating the number of different words in a book is 
another analogue of the general problem.^ 

3. Results and derivations. In order to show that an unbiased estimate of the 
number of classes in a population exists when the sample size n is not less than 
the maximum number q of elements contained in any class, we need prove the 
following two statements: 

Lemma 1 . Suppose we have K classes of N similar elements with ni elements in 
class 1, n 2 elements in class 2, • • • , elements in class K. The cUiss of an element 
is readily identifiable when the element is examined. Let 

q = max (n,). 

Suppose a random sample is drawn without replacement. If Xi is the number of 
classes containing i elements in the sample, and Kj is the number of classes containing 
j elements in the population, then 

E{zi) = 'tm\j,N,n)K^, 
where Pr{i | j, N, n) shall henceforth be an abbreviaiion of 

pi pN—i 

L' i O , 4 — j 


* Submitted by Charles Callard to Question and Answers, The American Statistician^ 
Vol.3, No. l,p.23. 

* Mentioned to the author by Dr. J. Stevens Stock of Opinion Research Corporation. 

* Mentioned in letter to the author from Frederick Mosteller of Harvard University. 




574 


LEO A. GOODMAN 


Proof. Let y, be the number of elements appearing in the sample from the «-th 

K 

class. The statement is proved by considering E(x^ = X) where 

•••1 

[l, ify. = 

= L 

[O, if y, ^ t. 

Lemma 2. Let 

“ ii 

It 

ai 

then 

23 -^i Pr(i Ii, N, n) = 1.‘ 


- l)(a - 2) • • • (o - < + 1), for <> 0, 

, for < = 0. 

-n + i~lf^ 

^ ^ ' nw 


This result follows directly from the fact that 

i (-!)’■ Cl [AT - n + f - = 0, fori ^ 1. 

t-0 

The following theorem may be proved directly by the preceding lemmas: 
Theorem 1. Suppose a sample of n elements is drawn without replacement from a 
population of size N which is subdivided into K classes. Let 


A, = 1 - (-1) 


i[N - n + i - 1]^ 


n' 


(i) 


If there are Xi classes containing i elements in the sample, then 

E(iA„).K. 

provided that n is not less than the maximum number q of elements contained in 
any class in the population. 

Theorem 2. There is at most one real valued statistic which is an unbiased 
estimate of the number of classes in a population.^ 

Proof. Let us order the points of the sample space in the following manner: 
Letting x* be the number of classes containing i elements in the sample, order 
the sample points by increasing values of Xn ; for equal values of Xn , order the 
points by increasing values of Xn-i ; for equal values of Xn-i , order the points 


• The author is indebted to Professor Frederick F. Stephan of Princeton University for 
a statement leading to a simplification of the original result. 

• This statement was mentioned to the author by M. P. Peisakoff of Princeton University. 



ESTIMATION OF NUMBER OP CLASSES 


575 


by increasing values of x„_ 2 ; • • • ; for equal values of x*, order the points by 
increasing xj. Let 

n 

Xi - n — • 

J-2 

To prove the theorem, we must show that to each 0* there corresponds a 
unique value S(i), which must be the value of our estimate when Oi is observed, 
in order that the statistic be unbiased. To each 

0 * = [xi{i), X2{i), xz{i)y • • • , Xn(i)]y 

let us associate the population 

~ ^Jxj{i), X2(i), xs(i) • • • , Xn(i)J. 

If Pi is the underlying population, then 0* for all i > 1 will occur with a proba¬ 
bility of zero. Since there are N classes in Pi , the value of the statistic must be 
S{1) = N whenever 0i is observed in order that the estimate be unbiased. 
The theorem may now be proved by induction. 

Since all the Pi used in the proof of Theorem 2 satisfied the condition that the 
maximum number q of elements contained in any class be not more than the 
sample size n, the statistic S is the only real valued statistic which is an unbiased 
estimate when q < n. 

When the restriction that q < n h removed, it is useless to search for an 
unbiased estimate since we have 

Theorem 3. There does not exist an unbiased estimate of the number of classes 
subdividing a popidation when it is not known whether the maximum number q of 
elements contained in any class is not more than the sample size n. 

By the preceding theorems it is clear that if an unbiased estimate exists it 
must equal S. However, S is generally not unbiased when n < q. 

Theorem 4. Suppose the statistics Si , S2 , Sn are the solutions of the system 
of linear equations 

n 

X. = £ Pr(f I j, N, n)S ,, for i = 1,2, ■■■ , n, 

j-i 

where Xi is the number of classes containing i elements in a sample of size n from a 
population of N elements. If Kj is the number of classes containing j elements in the 
population, then E{Sj) = Kj , for j = 1,2, •••,?! when n is not less than the 
maximum number q of elements contained in any class. 

Proof. We observe that the statement is certainly true for j = ^ + 1, ^ + 2, 
• • • , n, since 

E{S,) = iCy - 0, for i = g + 1, (7 + 2, .. • , n. 

The statement is also true for j = g, since 



576 


LEO A. GOODMAN 


To prove that E(Sj) = A',-, for any j < q, we assume it to be true for all i > j, 
whereupon its truth for j follows. 

n 

By Theorem 2, and 3, it is clear that X) Sj = S, Since 

j-i 


ijK, = iV, 

j-i 

it seems reasonable to ask whether the values of the estimates Si, & , • • • , Sn 
are in agreement with the kno\vn value of the size of the population. The unbiased 
estimate of K can be shown to be internally consistent by 
Theorem 5. Suppose a sample of size n is drawn without replacement from a 
population of N elements which is divided into classes. If Xi is the number of classes 
containing i elements in the sample, and if the linear equations 

J-1 

are solved simultaneously for Sj , then 

tjS, = N. 

J-1 

The theorem follows readily from the fact that 

i Pr(t I j, N,n) - n^ and ^ iXi = n. 

t-i iV »-i 


The variance of S may now be calculated by means of the formula 


(Ts — ^ ^ j)A.J A.< 

i,;—1 t,7—I 


9 

c 


+ X j) 

«-l 



where w,y is the covariance between and Xj , m^tii, j) is the covariance between 
dty, and djy^ when r ^ h,nr — s and nn = t, and ma(i, j) is the covariance between 
diy^ and djy^ when n, = s. 

Since the statistic S can be very unreasonable, we consider other possible 
estimates of K, The statistic 


S' ^ N 


may be shown to be a modification of S which replaces the number of classes 
containing i > 2 elements in the sample by an additional ixi classes, each 
containing only one element. Since the values of A, for i > 2 are relatively small 
in the practical problems of Section 2, S' might be used as an estimate. 

Another statistic which may be used to estimate K is 

N 


n 



ESTIMATION OP NXJMBEK OF CLASSES 


577 


This statistic may be shown to overestimate K whenever q 7 ^ The estimate 

S" = E 

t-1 

underestimates K when n < N — m where m is the least number of elements 
contained in any class. 

4. Binomial sampling. Let us suppose that each element from a population 
of N elements has an equal and independent chance p = 1 /r of entering the 
sample s. In this case, the size of the sample obtained is a random variable rj 
which is binomially distributed with mean ATp. If a large random sample of n 
elements is dra^vn without replacement from a large population of size N, then 
the results when interpreted in terms of binomial samples where p = l/r = n/N 
are a good approximation to the results obtained by the usual model. Binomial 
sampling may be considered a model of the case where one attempts to obtain 
the sampling ratio p = 1/r by drawing simultaneously an uncounted sample of 
elements which is estimated as being of the appropriate size. 

In the case of binomial sampling, the statistic 

B = 2 where = 1 — (1 — r)* 

may be shown to be an unbiased estimate of the number of classes in a population 
from which binomial samples are drawn. 

Let us now consider the statistic which corresponds to S' for the case of 
binomial sampling; i.e., 

B' = iV — r'x 2 . 


It may be shown that 

EW) = Ki + A', + i[j- Ci(l - py~-]K ,. 

J-3 


Hence, the statistic B' will underestimate K whenever 


Since 


V < 



for j - 3,4, ■■■ , q. 



is a decreasing function of j for j > 2, when p > \, B' o\ erestiniates, and when 


p < 1 




578 


LEO A. GOODMAN 


J5' UDderestimates the value of K. When p is such that 

the expected value of B' is brought closer to K by underweighting some Kj 
and overweighting others. 

6. A hypothetical population^ Suppose we draw a random sample of 1000 
elements without replacement from a population of 10,000 elements where 

= 9225, Ki = 336, Kz = 33, ^4 = 1. 

Hence, K == 9595. By means of Table 1, let us now compare on the basis of 
binomial sampling the estimates which have been presented in the preceding 
sections. Since N and n are large, these results are a good approximation to the 
case of random sampling without replacement. 


TABLE 1 


Estimate 

i 

Expected value 

Bias 

y/Mean Square Error 

s 

9595 

0 

347 

S’ 

9570 

-25 

207 

S" 

9959 

364 

490 

S'" 

996 

-8599 

8600 


It is clear that the best estimates of the number of classes in this particular 
population are S or S', since S has the least bias, E{S) — K, and S' has the 
least mean square error, E{S' — Kf. One might argue that both S and S' are 
the statistics which are capable of giving nonsensical estimates. However, we 
may decide to modify S or S' in order to always get reasonable estimates by 
using the statistics 

' n 

S, if AT > 6' > Ea;.-, 

i-l 

r = j AT, if S > N, 

Xi > if S < 

^ i""l i"l 

S', 'ifS'>Exi, 

T' = 

'^x,, if (S' < Xi . 

^ t—1 i—1 

^ Other examples have been investigated by Frederick Mosteller in Questions and 
Answers, The American Statistician, Vol. 3, No. 3, p. 12. 




ESTIMATION OF NUMBER OF CLASSES 


579 


Although these modified statistics T and T' are not unbiased, they have the 
desirable property that 

MSE{T) < MSE{S), and MSE{r) < MSE{S'). 

Since this hypothetical population is a plausible one for the practical problems 
of Section 2, the modified statistics 2' or T' seem, therefore, to be ‘^best'* for 
estimating the number of classes for these problems, where the ‘‘best’^ statistic 
is defined as the one which never gives unreasonable estimates and has the least 
mean square error. 

The author wishes to express his appreciation to Professor John W. Tukey 
whose suggestions were very helpful. 



CONCERNING COMPOUND RANDOMIZATION IN THE BINARY 

SYSTEM 

By John E. Walsh 
The Rand Corporation 

1. Summary. Let us consider a set of approximately random binary digits 
obtained by some experimental process. This paper outlines a method of com¬ 
pounding the digits of this set to obtain a smaller set of binary digits which is 
much more nearly random. The method presented has the property that the 
number of digits in the compounded set is a reasonably large fraction (say of the 
magnitude i or i) of the original number of digits. 

If a set of very nearly random decimal digits is required, this can be obtained 
by first finding a set of very nearly random binary digits and then converting 
these digits to decimal digits. 

The concept of “maximum bias” is introduced to measure the degree of 
randonmess of a set of digits. A small maximum bias shows that the set is very 
nearly random. 

The question of when a table of approximately random digits can be considered 
suitable for use as a random digit table is investigated. It is found that a table 
will be satisfactory for the usual types of situations to which a random digit 
table is applied if the reciprocal of the number of digits in the table is noticeably 
greater than the maximum bias of the table. 

2. Introduction and discussion. With the development of the theory of games 
and the more widespread use of experimental methods for determining approxi¬ 
mate distributions for statistics whose probability laws are difficult to obtain 
analytically, a demand for large sets of random digits has arisen. The problem of 
obtaining a set of digits which can be considered sufficiently random for the 
situations to which it would be applied, however, is not an easy one. One approach 
to this problem consists in obtaining a set of digits by some procedure and then 
applying tests to this set of digits to determine whether it can be considered 
satisfactoiy. Although appropriate choice of the tests may result in acceptance 
of sets of digits which are suitable for certain special t 3 rpes of situations, this 
approach is of a negative character and does not prove that a given set of digits 
is sufficiently random; it merely indicates that this may be the case. What is 
needed is a constructive approach to the problem, i.e., a method of constructing a 
set of random digits which can be proved sufficiently random for most applica¬ 
tions if certain intuitively acceptable conditions are satisfied. A step in this 
direction has already been taken by H. Burke Horton in [1] and by H. Burke 
Horton and R. Tjmes Smith III in [2], This paper presents what is hoped will be 
another step in this direction. 

In this paper, considerations will be limited to the case of binary digits. The 
reasons for this are twofold: 


580 



JOHN E. WALSH 


581 


(a) . The method used for compounding the digits yields a sharp upper 

bound for the maximum bias of the compounded set (i.e., a bound that 
the maximum bias could actually attain) only for the case of binary digits. 

(b) . Many of the experimental procedures for obtaining approximately 

random digits consist in first producing binary digits and then converting 
to another number base. Thus binary digits are produced directly. 
Hence, to use the results of this paper, the only modification required in 
these procedures would be to compound the binary digits before they 
are converted. 

Now let us consider some definitions: A set of random variables each of which 
can assume only the values 0 and I will be referred to as a set of binary digits. 
For convenience, each of the random variables making up a set of binary digits 
will be called a binary digit; this is not to be confused with the value obtained 
for the random variable. The absolute value of the deviation from 2 - of the 
conditional probability that a specified binary digit has the value 0 (or 1) is 
called the bias of that digit for the given conditions on the remaining digits of 
the set. The maximum bias of a binary digit is defined to be the maximum of the 
biases of that digit with respect to all possible conditions on the remaining 
digits of the set. The maxirmim bias of the set is the greatest of the maximum 
biases of the digits of the set. A set of binary digits is said to be random if its 
maximum bias is zero. 

The method used to prove that a set of compounded digits has a sufiSciently 
small maximum bias is somewhat similar to the situation encountered in mathe¬ 
matics where one begins with certain axioms and then draws conclusions. If the 
axioms are correct, the conclusions are necessarily valid. The first step in the 
compounding procedure consists in obtaining a set of binary digits by some 
experimental process (perhaps from a random digit machine which is based on 
some physical principle). The experimental process is so chosen that there is no 
doubt that the set of binary digits produced satisfies the two conditions: 

(i) . The maximum bias of the set is less than or equal to some specified 

value a(<|). 

(ii) . The digits of the set can be arranged in a specified array which has the 

property that the rows of the array are statistically independent. 

On the basis of these two assumptions (which play the same role as the axioms 
mentioned above), it can be proved that the maximum bias of the resulting 
compounded set of binary digits never exceeds a specified value which depends 
on a. Moreover, the upper bound for the maximum bias of the constructed set of 
binary digits can be made extremely small even for large values of a. 

If the experimental process is suitably chosen, conditions (i) and (ii) can be 
satisfied beyond any doubt. For example, let us consider 1000 people located in 
different parts of the world and not in contact with each other. Let each person 
flip an ordinary coin high in the air so that it Avill land on a flat hard surface, 
record the result (say 0 for a tail and 1 for a head), and then repeat this procedure 
until 6000 binary digits are obtained. If a is set equal to 3/10, condition (i) is 



582 


COMPOUND RANDOMIZATION 


obviously satisfied for the resulting set of 5,000,000 binary digits. Condition (ii) 
evidently holds if the array is taken to consist of 1000 rows where each row 
contains 5000 binary digits obtained from one person. 

The ideal choice for a would be the actual maximum bias of the set of binary 
digits obtained from the experimental process. Then the compounding procedure 
for obtaining a set of digits with a specified upper bound for the maximum bias 
would be simplified; also the number of digits in the compounded set would be a 
larger fraction of the original number of digits. Invariably, however, the proper¬ 
ties of the experimental process are not known with sufficient accuracy for 
obtaining an 3 dhing but a safe upper bound on the maximum bias of the set of 
digits produced. This situation is analogous to that of estimating the length 
of a stick which a very rough measurement has shown to be about 10" long. 
Although one might be very hesitant to believe that the length of the stick lies 
between 9.9" and 10.1", the contention that the length lies between 5" and 15" 
can be accepted mth virtual certainty and any logical conclusions based on this 
contention can also be accepted with virtual certainty. 

Given the number of binary digits in a set and the maximum bias of the set, 
is it possible to determine whether the set is suitable for use as a set of random 
binary digits? An important consideration in answering this question is the use 
that is to be made of the set of digits. This must always be taken into account 
before the suitability of the set can be decided. For example, if no more than 
1/1000 of the digits of the set are to be used for any particular situation, the 
set might be satisfactory for the types of cases to which it would be applied; 
on the other hand, the set might not be suitable for cases of these tsqies if all the 
digits of the set are used for each situation. This example calls attention to 
an important point, namely that the suitability of a set of binary digits depends 
on the number of digits in the set. Let a set have a fixed non-zero maximum 
bias p. If the set contains a sufficiently large number N of digits, relations and 
expressions involving the digits of the set can be found whose probabilities, 
moments, etc., can differ greatly from the values which would be obtained if the 
relations were based on the same number of truly random binary digits. As a 
specific example consider the relation 

AU the digits of the set have the value zero. 

If the reciprocal of the number of digits in the set is of the same order of magni¬ 
tude or smaller than the maximum bias of the set, the ratio of the probability 
of this expression to its hypothetical value can differ noticeably from unity. 
Thus, at least in certain special cases, a jiecessary condition for the suitability 
of a set of binary digits is that 1 /N > > p. This condition, however, is also 
sufficient for most situations to which a set of random digits would be applied. 
The approximate sufficiency of the condition is a direct consequence of the fact 
that any set of N binary digits can be considered as a sample value from an 
IV-dimensional population consisting of 2" discrete points. The 1/JV > > p 
restriction implies that the probability concentrated at each of the 2^ points is 



JOHN E. WALSH 583 

very nearly equal to the hypothetical value of (i)^ for all possible conditions 
on the remaining digits of the set. 

The \/N > > p condition is very satisfactory from the viewpoint of proba¬ 
bilities. The probability of any relation based on a subset of the digits of the set 
(possibly conditioned on other digits from the table) can be interpreted as the 
sum of the probabilities of those points included in a certain region (defined by 
the relation) of the AT-dimensional probability space of the set of digits. By 
expanding ± p)^ it can be shown that the ratio of the probability of any 
relation based on one or more digits from the set to the corresponding value for a 
truly random set of digits will be very nearly equal to unity if l/AT > > p. 

It is evident that the higher order moments of an expression based on one or 
more digits of the set can differ noticeably from its hypothetical value even if 
\/N > > p; any deviation from the ideal situation, no matter how small, can 
become important for high order moments. For the first few moments, however, 
deviations from the hypothetical values are not appreciable since these moments 
are based on the probabilities at the 2^ points in the AT-dimensional probability 
space and these probabilities arc very nearly equal to the hypothetical value of 
(i)^ in all cases. 

The above discussion shows that the values of N and p are sufficient to deter¬ 
mine whether a set of binary digits is suitable for use as random binary digits 
for a wide variety of situations. Analogous considerations apply for digits to any 
number base, 

A magnitude definition of the relation l/N > > p is difficult to specify. If p 
is the upper bound for the maximum bias of a set of digits obtained by the 
compounding procedure outlined in this paper, however, it seems that a reason¬ 
able condition would be that l/AT > 50 p. This condition implies that the 
probability of any relation based on digits of the set can not differ from its 
hypothetical value by more than approximately 4%. In most practical 
applications the value obtained for p would be noticeably greater than the 
true value of the maximum bias of the compounded set. 

Since the maximum number of digits w4iieh can be taken from a table is the 
total number of digits in the table, the above considerations suggest that a 
random digit table should be constructed so that the reciprocal of the number of 
digits in the table is noticeably greater than the maximum bias of the table. 
Any table having this propert}'- would be satisfactorj" for most situations to 
which it would be applied. 

Now let us consider two different compounding methods which produce sets 
of binary digits with the same upper bound for the maximum bias. If the com¬ 
putational difficulties of applying the two methods arc of comparable magnitudes, 
it seems reasonable to prefer the method which yields the larger set of digits. 
For example, if the number of digits in the set obtained by the first method is 
only 1/8 of the original number of digits while the number in the set obtained 
by the second method is 1/3 of the original number, the second method would 
seem preferable even if it required as much as 100% more computation. 



584 


COHPOXTND RANDOMIZATION 


The compounding method presented in this paper has the property that the 
number of digits in the compounded set can be held to a reasonably large fraction 
of the original number of digits at the same time that the upper bound for the 
maximum bias is made extremely small. The method presented by Horton in [1] 
does not have this property. For example, let a = 1/10. Applying Horton’s 
method, when the compounded set consists of 1/8 of the original number of 
digits the upper bound for the maximum bias is 12.8 X 10”^ The example 
presented in section 3, however, shows that a compounded set whose number of 
digits equals 1/3 of the original number and which has an upper limit of 11.7 X 
10~’' for the maximum bias can be obtained using the method presented in the 
next section. 

Although the compounding method outlined in section 3 is presented as a 
series of steps, the value of a digit of the compounded set can be written as a 
linear function (mod 2) of digits of the original set. This was not done in what 
follows because of the complicated nature of the general form of such expressions. 
In any particular case, however, these expressions can be written without much 
trouble and the compounded digits computed from the original digits in a 
single step. 

3. Outline of compounding method and statement of theorems. This section 
contains a description of the compounding method mentioned in the preceding 
two sections as well as statements of the basic theorems concerning this com¬ 
pounding method. Proofs of the results stated in this section are given in section 4. 

Let us consider the array of mn binary digits 

^11 f ^12 7 * * * I ^In 

^21 f ^22 f * * * j ^2n 

( 1 ) ... 


f ^m2 > ’ * * > 

which satisfies conditions (i) and (ii); i.e., the maximum bias of the set (1) is 
less than or equal to a while a digit Xuv is independent of a digit x„ if r u 
(if r = u, however, a:„, is not necessarily independent of x™). 

Let a new set of (m — l)n binary digits 

(2) yo, (t =■ 1, • • • , m - l;i = 1, • • • , n) 

be formed as follows; 

yo * (“od 2), 

(i = 1, ••• , m - l;j « 1, ••• ,n). 

Then the biases of the ya have the properties 
Theorem 1. Let U bea specified aetof t — lof yu, • • • , Vit-tn , ya+vn, ••• , 
, (f < t < m — 1), while V is a epecijied set of zero or more of the ypt’B 



JOHN E. WALSH 


585 


wiiJi q 9^ j. Also let 0 consist of the set of integers such that peBifypjf U. Then, 
ifyu = maximum bias for the set x„i, ■ ■ ■ ,Xun , (u = 1, • ■ • , n), 

1 Priya = 011/, F) — f | < Yi [l — ^(2 ~ T*)/(i + 7 *)] 

/[I + Uih - yk)/ih + y»)] 

kit 

for aU possible selections of U, V and of the values for the digits of these sets. 

Corollary 1. If exactly t — I of yu, • • • , y(,i-i)j, ya+Dj , • • • , y(m-\)j have 
kruywn values, the maximum bias of the binary digit ya is less than or equal to 

«[1 -ih- ctY/ih + «)V[1 + ih- ocY/a + «)']. 

Corollary 2. The maximum bias of the set (2) is less than or equal to 

a[l - (i - «)”-V(§ + a)”“V[l + (.h- + «)”"*]• 

The basic operation in the method of compounding binary digits is outlined 
in the procedure given for obtaining the ya from the . Let m = (1 + ti) • • • 
(1 + ijr). Then a set of • • • txn binary digits can be obtained from the original 
set of mn digits Xuv by continually applying this basic procedure. The first step 
consists in dividing the rows of (1) into (1 + < 2 ) • • • (1 + <jr) sets each consisting 
of (1 + <i) rows in some specified fashion. Each of these sets is an array of 
(1 + ti) X n binary digits for which the rows are independent. Apply the method 
used to obtain the j/,y from the x„, to each (1 + <i) X n array separately. Then 
each array yields a set of hn binary digits and there are (I + < 2 ) • • • (! + <*) 
such sets. In each set arrange the hn digits into a single row in some specified 
manner. This furnishes a new array of [(1 + fe) • • • (1 + <*)] X [fiw] binary 
digits for which the rows are independent. Repeat this procedure with respect to 
<2 thus obtaining a new array of [(1 + < 3 ) • • • (1 + <jr)] X [ktin] binary digits for 
which the rows are independent; etc., until a (1 + <jr) X (<i • • • fjc-iw) binary 
digit array for which the rows are independent is obtained. Then form a set of 
binary digits , (? = 1 ,•••,<*; A = 1, • • • , <1 • • ■ <*-!«), from this array in 
exactly the same manner that the ya were obtained from the x„,. Then the 
biases of the Ygh have the properties 
Theorem 2. Let fio, , • • • , Pk be defined by — a and 

Pu, = '”]/[! + (i - 

{w=l,---,K). 

Then, if exactly f - 1 0 / Fi* , • • • , F(b-j)a , y\a+i)h , • • • , F,** have knoum values, 
{I < t < tx), the maximum bias of the digit Ygh is less than or equal to 

pK-i[i - (i - pK-iY/(i + /Jx-i)‘J/[i + (i - pK-xY/(h + 

In particular, the maximum bias of the entire set of Ygh is less than or equal to 
Pk . Also 

— (i — |8jt-i)V(i + + (i — Px-i)‘/(i + 

< 2**"* • t • tx-i ■ tx-i • • • lf~* • tf" • a'. 


(3) 



586 


COMPOUND RANDOMIZATION 


The inequality (3) is frequently useful from a computational viewpoint. 
Although the right hand side of (3) is usually noticeably greater than the left 
hand side, in many cases this rough upper boimd is itself small enough to show 
that the upper bound for the maximum bias is of the desired order of magnitude. 

If the set of compounded digits is to be used for a random binary digit table, 
Theorem 2 shows that advantage can be taken of the position of the digits in the 

table. Let M = • • • tK-A.n and enter the values of the Yghy (g — 1, • • • , ; 

A = 1, • • • , ilf), into the table in the order 

1^11 , Yi2 , • • • , Yiu , Y 21 , — , Y 2 M , Yzi , • • * , Ytg-i , • • • , Ytg:M • 

Then, if a set of digits is taken from this table in consecutive order (Fu follows 
YtKM), the upper bound for the maximum bias of this set is dependent on the 
number L of digits in the set. From Theorem 2, the maximum bias of a set of L 
digits taken in consecutive order from a table formed in this manner is less 
than or equal to 

— (i — + /^iT-OVlf + (^ — 

for values of L such that (i — 1)M < L < tM, where I < t < Ik • Thus, if a 
small set of digits is taken from this table in consecutive order, the upper bound 
for the maximum bias of this set will usually be noticeably smaller than the 
upper bound for the maximum bias of the table. Since many uses of a random 
digit table require only a small fraction of the total number of entries in this 
table, this property would seem to be desirable. It should be emphasized, how¬ 
ever, that the maximum bias of a set taken from this table is always less than 
or equal to Pk irrespective of the positions that the digits of the sets occupy in 
the table. Thus nothing is lost by constructing the table in this manner but 
something can be gained for small sets if the digits are taken from the table in 
consecutive order. 

Now let us consider situations in which it is required that the number of 
digits in the compounded set is at least a specified fraction, say 1/C, of the 
original number mn of binary digits. This requires that K and h , • • • , be 
chosen so that 

<1 • • • W(1 + ^i) • • • (1 + tK) > 1/C. 

Also, for given values of K and C, it seems preferable to choose < 1 , • • •, so that 
the value of Pk is at least approximately minimized. Examination of the results of 
Theorem 2 indicates that a reasonable method of determining the values of 
ti y • ytK with this in mind consists in first choosing h as small as possible, then 
(given the value of ti equal to its minimum value) choosing ^2 as small as possible, 
etc. This method is also recommended by the fact that the resulting values of 
hy • • • , are readily determined. The explicit procedure for finding t\ y ••• ^Ik 
is given by 

Theorem 3. Let the values of the integer K and the constant C (> 1) be given and 
consider the integers h , • • • ,tK subject to the condition 

ti • • • tx/O- + ^i) • • • (1 + 1/C. 



JOHN E. WALSH 


587 


The minimum value of ti is the smallest integer satisfying 

k > 1/(C - 1). 

In general, 2 < w < K having already determined k , • • • , cls their 
minimum values, the value of tw is the smallest integer satisfying 

tw > l/[Ck • • • tw-i/O- + ^i) * • * (1 + 1]. 

Finally, given k , • • • , Ik^i as their minimum values, the minimum value of Ik 
is the smallest integer satisfying 

Ik ^ ^/[Ck • * * tK^i/{\ + ^i) * • • (1 + 1 ]. 

Now consider the general situation encountered in the application of the 
compounding process outlined above. Here the values of a, C are given and it is 
required to choose K and , • • • , ijr so that the upper bound for the maximum 
bias of the compounded set oik* •• IkU binary digits Ygh is less than or equal to a 
specified value 6 . The following procedure furnishes a method of solving this 
problem: 

Let K = 1 , obtain k according to Theorem 3, and then compute jSi. If jSi < &, a 
solution has been obtained. If fix > b, let K = 2 and repeat the procedure to 
obtain P 2 . If ft < h, the values of k , k and K = 2 are a solution. If ^2 > b, 
repeat the procedure for iC = 3; etc. In practical situations, the value of K is 
usually bounded (e.g., by independence properties of the original set of digits). 
If jSjc is still greater than b for the maximum permissible value of K, no solution is 
obtained. This means that either b must be increased or 1 /C decreased or both 
if a solution is to be found. In many cases, a large amount of computation can be 
avoided by using the inequality (3). For marginal situations, however, a solution 
may be missed by using (3) instead of computing • 

Example of method. The following table represents an example of application 
of the above method: 

« « 1/10 _ 1/C « 1/3 _ h 2 X _ 

iiC = 1 , = 1 ' ft = 2 X 10 ~* 

iC = 2, = 1, <2 = 2 ft < 1.6 X 10~* 

iC = 3, = 1, = 3, <3 = 9 ft < 1.04 X 10“* 

X = 4, <1 « 1 , <2 = 3, <2 = 10, <4 = 44 ft < 1.17 X 10"‘. 

Thus K = 4, <1 = 1 , <2 = 3, <1 = 10 , <4 = 44 is a solution. 

4. Derivations. The purpose of this section is to furnish proofs of the results 
stated in the preceding sections. 

4.1 Proof of Theorem 1 . Let us consider the conditional probability that an 
arbitrary but fixed has a specified value when the values of a fixed subset of 
zero or more of the rema ining j/’s are known. For convenience, assume that yu 
is the binary digit considered and that the values of y^, ya, • • • , ya (where < 

is a fixed integer such that 1 < < < m — 1 ) and a set S are given while the 



588 


COMPOUND RANDOMIZATION 


values of the remaining y’s are unknown. Here S represents an arbitraiy but 
fixed set of zero or more of the 2 /,-,’s for which j > 2 while < = 1 has the inter¬ 
pretation that none of the ya , (i > 2), are given. Let 

Pr{xmi = 0 I S) = i + a«+i and Pr(xki = 6* | S) = % + a* , 

{k — 

Then, using the independence conditions satisfied by the x’s, 

Priyii = bilyn = h 2 , ■ ‘, ya = bt; S) 

[-<+1 m T /r«+i «+i I 

“ IT (i + at) + IT — «*) / IT (i + «») + IT (i — «4) 

L*-l J / Lt-2 k-i J 

r(+i <+i -| /r«+i «+i “] 

= i + aJ IT(i + a*) — TT “ “*) / IT (i + a*) + IT “ “*) 

Li-2 i-2 J/ Li“2 i-2 J 

= i + ai5. 

Now I 5 I = (1 — P )/(l + P)ifO<P< 1 and equals (P — 1)/(1 -1- P) if 

«+i 

P > 1, where P = II (2 — a*)/-f- at). Let Tu be the maximum bias for the 

1-2 

set of binary digits , • • • , x„„ , (m = 1, • • • , m). Then it is easily seen that 
max I « I < 1^1 - n (i - 7i)/(^ + + II (^ - 7i)/(2 + 7*)J • 

Thus 

I Priyn = ?>i 1 2/21 = ^ 2 , • • •, ; -S) - ^ I 

< 7i j^l — II (i — ’Yk)/ih + 7i) ih — 7*)/(l + 7i)J 

for all possible selections of l!>i, ,bt and all possible selections of S and the 

values for the digits of S. It is to be observed that this inequality is valid for < = 1. 

Evidently this result can be modified to apply to an arbitrary for which 
t — 1 of j/iy, • • • , y«_i)y, ya+Di , ' • • , Vim-Di have given values. This obvious 
modification results in Theorem 1. 

4.2 Proof of Theorem 2. By Corollary 2, the maximum bias of the [(1 -|- < 2 ) • • • 
(1 + <jt)] X [<in] array is less than or equal to j3i. In general, 2 < w < K, by 
CoroUary 2 the maximum bias of the [(1 ^ /„+i) • • • (1 -f- <*)] X [<; • • • <«,»] 
array is less than or equal to /?«,. Finally, by Corollary 1, if exactly < — 1 of 
Yih, • • • , , Y(g+i)h , • • • , Yt^h have known values, (1 < t < <*), the 

maximum bias for the binary digit Y^t, is less than or equal to 

Ux-iU - (i - + /3jt-i)V[l + {h- ^K-iV/ih + Pk-x)% 



JOHN E. WALSH 


589 


The inequality (3) is an immediate consequence of the relation 

a[l - (i - a)V(i + a)-]/[l + (i - a)7(^ + «)'] < 28a\ 

4.3 Proof of Theorem 3. From the given condition 

(k ^ • • • Ik^i/{\ + ^i) • • • (1 + — 1]. 

From this inequality for Ik it follows that 

• • • <ir-i/(l + <i) • * * (1 + — 1 > 0. 

Thus 


Ik-I > ^/[Cti • • • ^JC-2/(l + <l) • • • (1 “■ ^JC- 2 ) ““ 1]. 

In general, 3 < ty < iiC — 1, given 

ivo > i/[Cii • • • + ^ 1 ) • • • (1 + "" 1 ] 

itjollows that 

Cii • • • ^w-i/ (1 + ^ 1 ) • • • (1 + — 1 > 0 


whence 


> l/[Cti • • • <^- 2/(1 + ^ 1 ) • • • (1 + ^w-2) — IJ. 


Finally 


h > 1/(C -- 1). 


REFERENCES 

[1] H. Burke Horton, method for obtaining random numbers, Annals of Math, Stat,^ 

Vol. 19 (1948), pp. 81-85. 

[2] H. Burks Horton and R. Ttnes Smith III, **A direct method for producing random 

digits in any number system,^' Annals of Math, Stal.^ Vol. 20 (1949), pp. 82-90. 



THE DISTRIBUTION OF EXTREME VALUES IN SAMPLES WHOSE 
MEMBERS ARE SUBJECT TO A MARKOFF CHAIN CONDITION 

By Benjamin Epstein 
Department of MathematicSj Wayne University 

1. Introduction. The extreme value problem as treated in the literature 
concerns itself with the following question: To find the distribution of the 
smallest, largest, or more generally the j^th largest, or j^th smallest values in 
random samples of size n, drawn from a distribution whose probability law is 
given by the d.f. F{x), In this formulation the observed sample values Xi , * • • , Xn 
are assumed to be statistically independent. While the assumption of inde¬ 
pendence may be a good approximation to the true state of affairs in some 
cases, there are situations where this assumption is not justified. 

Suppose, for instance, that the observations in the sample are ordered in time. 
Then it may happen that successive observations are stochastically dependent, 
the extent of this dependence being a function of the time interval separating 
these observations.^ In such cases the present distribution theory for extreme 
values in samples of size n is inadequate and must be replaced by more general 
results. 

It is clear that a clean-cut analytic solution to the problem of the distribution 
of extreme values in samples whose members may be stochastically dependent 
can be expected only for certain special kinds of dependence among successive 
observations. We are able, in this paper, to obtain the distribution of smallest, 
largest, second smallest, and second largest values in samples of size n drawn at 
equally spaced time intervals from a stationary Markoff process. 

2. The distribution of smallest and largest values in samples of size n drawn 
at equally spaced time intervals from a stationary Markoff process. In this 
section the following assumption is made: 

(A) observations Xi, X 2 , • • • , Xn, • • • are taken in order at times ^ 1, 

^ = 2, • • • , ^ = n, • • • from a stationary Markoff random process. 

The only information needed in the investigation of a stationary Markoff 
process at integral values of time is the function 

(1) F 2 {x, y) = Prob {Xi < x, Xi+i < y), 

independently of i, where ^ 2 ( 2 ^, y) must be such that the marginal distribution 
obtained by integrating over x or y (if Xi or 3 :*+! take on a continuous range of 

* If the observations xi , 0:2 , • • • ^Xn ^ • are taken at discrete times , • • • 

a measure of stochastic dependence between Xi and x, is the ordinary coefficient of correla¬ 
tion Tij . If the observations are taken from a continuous stochastic process a natural 
measure of stochastic dependence between observations made at two different times is the 
covariance function of the process. In this paper we shall limit ourselves to processes which 
are discrete in time. 


590 



EXTREME VALUES 


591 


values) or summing over the possible values of Xi or a:,+i (if Xi and x,+i can take 
on only discrete values) is of the form 

(2) jF’i(x) = Prob {xi < x), 
independently of i. 

An example of a random process meeting condition A is furnished by the 
Omstein-Uhlenbeck process [1; 2]. In this case the joint d.f. of x, and x,+i is 
given by a non-singular bivariate Gaussian distribution. The results in the 
present paper are stated completely in terms of the d.f.’s Fi{x, y) and fi(x) 
defining the stationary Markoff process and will in particular be valid for observa¬ 
tions taken at uniformly spaced time intervals from an Ornstein-Uhlenbeck 
process. 

In this section we shall find the distribution of smallest and largest values in 
samples Xi, Xi, • • • , x„ drawn from a random process under assumption A and 
specified by the bivariate d.f. Fi{x, y) and the associated one dimensional 
marginal d.f. Fi(x). We first prove Theorem I. 

Theorem I. Under assumption A, the distribution of largest values in samples of 
size n is given by the d.f. Gn\x) = [Fi(x, a:)]"~V[^i(a^)]"~*- 
To prove this result we note that Gn\x), the probability that the largest 
value in samples of size n is <x, is given by 

(3) Gn\x) = Prob {Xi < X, Xi < X, , Xn < x). 

To evaluate the right-hand side of (3) we proceed as follows: 

(4) Prob (xi < X, X 2 < X, • • • , x„ < x) = 

Prob {xi < x,Xi < X, • • • , x„-i < x) Prob (x„ < x | xi < x, • • • , x„_i < x). 

But under assumption A, (4) becomes 

(5) Prob (xi < X, X 2 < X, • • • , x„ < x) = 

Prob (xi < X, Xi < X, • • • , Xn-i < x) Prob (x„ < x | Xn_i < x) 
or 

(50 Gi'\x) = Gi^ltix) Prob (x„ < x 1 x„_i < x). 

But according to assumption A, and (1) and (2) 

(6) Prob (x„ < X I x„_i < x) - Prob (x„_i < x, x„ < x)/Prob (Xn-i < x) 

= Fi(x, x)/Fiix). 

Therefore 

(7) Gi'\x) = (?i‘ii(x) Fi(x, x)/Fi(x) 

= Gl^\x) (Fi{x, x))"-V(Fi(x))"-‘ 

= (F2(x,x))"-V(Fi(x))"-*. 



592 


BENJAMIN EPSTEIN 


This proves Theorem I. 

For n = 1, 2, and 3 respectively one gets 

(8) (?}”(*) = F,(x), = Fiix, x), Gi^Hx) = (F,(x, x)f/Fiix). 

Theorem II. Under assumption A, the distribution of smallest values in samples 
of size n is given by the d.f. 

/„•> - 1 ~ 2Fi(x) + F2 (x, x)]”"’ 

(9) -(1 - -• 

To prove this result we first note that Hn\x), the probability that the smallest 
value in samples of size n be <x is given by, 

1 — Prob (xi > X, X 2 > X, • • • , x„ > x). 

To evaluate Hn\x) we proceed as follows: 

(10) Prob (xi > X, X 2 > X, • • • , x„ > x) = 

Prob (xi > X, X 2 > X, • • • , x„_i > x) Prob (x, > x | Xi > x, • • • , x„_i > x). 
But under assumption A, (10) becomes 

(11) Prob (xi > X, X 2 > X, • • • , x„ > x) = 

Prob (xi > X, X 2 > X, • • • , .x„_i > x) Prob (x„ > x | x„_i > x). 

But 


(12) Prob (x„ > X 1 x„_i > x) = Prob (x„_i > x, x„ > x)/Prob (x„_i > x). 
To evaluate Prob (x„_i > x, Xn > x) we note that 

(13) Prob (x„_i > X, x„ > x) + Prob (x„_i < x, x„ > x) 

+ Prob (xn-i > X, x« < x) + Prob (x„_i < x, x„ < x) = 1. 

Also 

(14) Prob (x„_i < X, x„ > x) + Prob (x»_i < x,x„ < x) 

= Prob (x„_i < x), 

and 

(16) Prob (x„_i > X, x„ < x) + Prob (xn-i < x, x„ < x) 

= Prob (x„ < x). 

Recalling that 

(16) Fzix, x) = Prob (x»_i < x, x» < x) 
and 

(17) Fi(x) = Prob (x„_i < x) =» Prob (x„ < x) 



EXTREME VALUES 


59a 


we get 

(18) Prob (x„_i > X, *„ > x) = 1 — 2Fi(x) + F^^x, x). 

Therefore (10) becomes 


(19) Prob (xi > X, xj > X, • • • , x„_i > x, x^ > x) = 


Prob (xi > X, X2 > X, ■ • • , x„_i > x)[l - 2Fi(x) + Fi{x, x)]/(l - Fi(x)), 
Applying the recursion formula (19) successively we obtain 


(20) Prob (xi > X, Xa > X, • • • , Xn > x) = 

Prob (xi > x)[l - 2Fi(x) + F,{x, x)]"-*/^ - F,(x)]"-* 


= [1 - 2 Fi(x) + F 2 (x. x)]“-Vll - ^’i(x)]"-*. 

Therefore Hn^{x), the probability that the smallest value in samples of size n 
is '<x, is given by: 


( 21 ) 


= 1 - 


[1 — 2Fi(x) + F 2 (x, x)]" ^ 
[1 - F 2 ( x )]-2 


This completes the proof of Theorem II. 

In particular for n = 1, 2, and 3 respectively the d.f.’s of the smallest value in 
samples of size n are given by: 

fl'l”(®) = Ft(x), Hi^\x) - 2F,(x) - F2(x, x), 


( 22 ) 




[1 - 2 F 2 (x) + F,ix, x)f 
1 - Fi(x) 


3. Distribution of the second largest and second smallest values in samples 
of size n drawn at equally spaced time intervals from a stationary Markoff 
process. Under assumption A of Section II we can state the following theorem. 

Theorem III. Under aesumption A the distribiUion of second largest values in 
samples of size n,n >2, is given by the d.f. G®(x), 

G^:\x) = [F,(x, x)]"-V[Fi(x)]"-* 


+ 2[F2(x, x)]"-*{Fi(x) - F 2 (x,x)}/[Fi(x)]»-* 

+ (n - 2 ) [F,(x, x)]"-* {F2 (x) - F2(x, x)}Vlfi(*)l"“’(l - F^ix)). 

To prove this result we first note that Gn\x), the probability that the second 
largest value is < x, is given by 

G» ^(x) = Prob (xi < X, X2 < X, • • • , x„ < x) 

+ Prob (Xi > X, X2 < X, Xs < X, • • • , Xn < x) 


+ Prob (xi < X, X2 > X, X3 < X, X4 < X, • • Xn < x) + • • • 


+ Prob (xi < X, X2 < X, • • • , Xn-2 < X, X«_1 > X, Xn < x) 
+ Prob (xi < X, X2 < X, • • • , x„_i < X, x» > x). 


(23) 



594 


BBNJAMIN BPSTEIN 


According to Theorem I 

(24) Prob ix\ < X, Xi < X, Xn < x) ^ a:)]’*~Vr^i(*)]"~** 

It can readily be shown that 

Prob (Xi > X , Xi < X , Xz < X , •••, Xn < x) 


(25) = Prob (xi < X, xi < X, ••• , x„-i < x,Xn > x) 

= [Fz(x, x)]“-* {^(x) - Fzix, x)\/[Fz{x)r\ 

It can also be shown that each of the remaining (n — 2) terms on the right-hand 
side of (23) is equal to 

(26) [F,(x, x)]”-’ {F,(x) - Fz(x, x)}V[Fi(x)]’-’(l - F:(x)). 

Combining (23), (24), (25), and (26) we get the desired result in Theorem 
III, i.e., 

<?!*’(*) = [Fz(x, x)]’-VfFi(x)]-* 

(27) -f 2[F,(x, x)]"-* |Fx(x) - ^^(x, x))/[Fi(x)]”-* 

-h (n - 2)[Fx(x, x)]"-* {Fi(x) - Fx(x, x)lV[Fi(x)]"-''(l - Fi(x)). 


In a similar way one can prove Theorem IV. 

Thbobem IV. Under assumption A, the distribution of second smallest values in 
samples of size n,n >2, is given by the d.f. J7i*’(x). 


7fl*>(x) = 1 - 


[1 - 2Fi(x) -t- Fz(x, x)]" 

U - Fi(x)l»-^ 


(28) 


- 2 


[I - 2Fi(x) -1-F2 (x,x)]" 


{Fi(x) — F 2 (x, x)} 


- (n - 2) 


[1 - Fi(x)]-* 

[1 - 2Fi(x) -f Fz(x, x)]"-’ {Fi(x) - Fzix, x)}^ 


[1 - Fi(x)]»-» 


F,(x) 


REFERENCES 

[1] J. L. Doob, “The brownian movement and stochastic equations/^ Annals of Mathe¬ 

matics, Vol. 43 (1942), pp. 351. 

[2] M. C. Wang and G. E. Uhlenbeck, “On the theory of the brownian motion II,*^ Reviews 

of Modem Physics, Vol. 17 (1945), p. 323. 



NOTES 

This section is devoted to brief research and expository artides and other short items. 


NOTE ON THE CONSISTENCY OF THE MAXIMUM LIKELIHOOD 

ESTIMATE! 

By Abrajeam Wald 

Columbia University 

1. Introduction. The problem of consistency of the maximum likelihood 
estimate has been treated in the literature by several authors (sec, for example, 
Doob [1]^ and Cramer [2]^). The purpose of this note is to give another proof of the 
consistency of the maximum likelihood estimate which may be of interest because 
of its relative simplicity and because of the easy verifiability of the underlying 
assumptions. The present proof has some common features with that given by 
Doob, insofar that both proofs make no differentiability assumptions (thus, not 
even the existence of the likelihood equation is postulated) and both are based 
on the strong law of large numbers and an inequality involving the log of a 
random variable. The assumptions in the present note are stronger in some 
respects than those made by Doob, but also the results obtained here are stronger. 
For the sake of simplicity, the author did not attempt to give the most general 
results or to weaken the underlying assumptions as much as possible. Remarks 
on possible generalizations are made in Section 4. 

Let Xi, X 2 , • • • , etc. be independently and identically distributed chance 
variables. The most frequently considered case in the literature is that where 
the common distribution is known, except for the values of a finite number of 

^ The author wishes to thank J. L. Doob for several comments and suggestions he made 
in connection with this note. 

* According to a communication from Doob, his Theorem 4 is incorrect, but is correct if 
the class of almost everywhere continuous functions in that theorem is replaced by a suitable 
class C of functions. The class C can be any one of a variety of classes; for example, the class 
of bounded almost everywhere continuous functions, or the larger class of almost every- 
where continuous functions each of which is less than or equal in modulus to any one of a 
prescribed sequence of functions with finite expectations. His Theorem 5 on the consistency 
of the maximum likelihood is then dependent on the class C used in Theorem 4. 

* The proof given by Cram4r [2], pp. 500-504, establishes the consistency of some root 
of the likelihood equation but not necessarily that of the maximum likelihood estimate 
when the likelihood equation has several roots. Recently, Huzurbazar [3] showed that 
under certain regularity conditions the likelihood equation has at most one consistent 
solution and that the likelihood function has a relative maximum for such a solution. 
Since there may be several solutions for which the likelihood function has relative maxima, 
Cram4r*s and Huzurbazar’s results taken together still do not imply that a solution of the 
likelihood equation which makes the likelihood function an absolute maximum is necessarily 
consistent. 


595 



596 


ABRAHAM WALD 


parameters, • • • , In this note we shall treat the parametric case. For 
any parameter point 0 — (J^, ••• , 0^), let F(x, 0) denote the corresponding 
cumulative distribution function of X,-; i.e., F{x, 0) = prob. {X< < *}. The 
totality €l of all possible parameter points is called the parameter space. Thus, 
the parameter space 0 is a subset of the A;-dimensional Cartesian space. 

It is assumed in this note that for any 0, the cumulative distribution function 
F{x, 0) admits an elementary probability law /(x, 0). If F{x, 0) is absolutely 
continuous, /(x, 0) denotes the density at x. If F{x, 0) is discrete, /(x, 0) is equal 
to the probability that X, = x. 

Throughout this note the following assumptions will be made. 

iYssuMPTioN 1. F(x, 0) is either discrete for all 0 or is absolutely continuous 
for aU 0. 

Before formulating the next assumption, we shall introduce the following 
notations: for any 0 and for any positive value p let f(x, 0, p) be the supremum of 
/(x, 0') with respect to 0' when \ 0 — ^ \ p. For any positive r, let v>(x, r) 
be the supremum of /(x, 0) with respect to 0 when | P | > r. Furthermore, let 
f*(x, 0, p) = f(x, 0, p) when f(x, 0, p) > 1, and =1 otherwise. Similarly, let 
<P*{x, r) = <p(x, r) when <p(x, r) > 1, and =1 otherwise. 

Assumption 2. For sufficiently snwll p and for sufficiently larger r the expected 

>00 aOO 

vodues / log f*{x, 0, p) dF(x, 0o) and I log <p*(x, r) dF{x, 0o) are finite where 
00 •!—00 

00 denotes the true parameter point* 

AssxnuPTiON 3. If lim 0i = 0, then lim/(x, 0,) = fix, 0) for aU x except perhaps 

{>■00 iaaOO 

on a set which may depend on the limit point d {hut not on the sequence di) and 
whose prohability measure is zero according to the prohability distribution corre¬ 
sponding to the true parameter point Bo . 

Assumption 4. If Bi is a parameter point different from the true parameter point 
Bo , then F{x^ ^i) F{x^ Bo) for at least one value of x. 

Assumption 5. If lim 1 1 = , then lim/(a;, — Q for any x except perhaps 

i^oo {■•00 

on a fixed set {independent of the sequence Bi) whose probability is zero according 
to the true parameter point . 

Assumption 6. For the true parameter point Bo we have 

f I log/(x, 0o) I dF(x, 0o) < 00 . 

w—00 

Assumption 7. The parameter space Q is a closed subset of the k-dimensional 
Cartesian space. 

Assumption 8. f{x, 6, p) is a measurable'function of x for any B and p. 

It is of interest to note that if we forbid the dependence of the exceptional set 
on 0 in Assumption 3, Assumption 8 is a consequence of Assumption 3, as can 
easily be verified. 

* The measurability of the functions /*(ar, p) and ip*(x, r) for any B, p and r follows 
easily from Assumption 8. 



MAXIMUM LIKELIHOOD ESTIMATE 


597 


In the discrete case, Assumption 8 is unnecessary. In fact, we may rephvcc 
f{x, e, p) everywhere by/(x, 6, p) where/(a:, 6, p) = f(x, 8, p) whcii/(x, So) > 0, 
ajxdJ{x, 6y p) = I when /(x, ^o) = 0. Here denotes the true parameter point. 
Since /(x, ^o) > 0 only for countably many values of x, /(x, 8, p) is obviously a 
measurable function of x. 

In the absolutely continuous case, F(x, 8) does not determine/(x, 8) uniquely. 
If Assumptions 3, 5 and 8 hold for one choice of /(x, 8), they do not necessarily 
hold for another choice of /(x, 8). This is in a way undesirable, but assumptions 
of such nature are unavoidable if we want to insure the consistency of the 
maximum likelihood estimate. It is, however, possible to formulate assumptions 
which remain valid for all possible choices of /(x, 8) and which insure the con¬ 
sistency of the maximum likelihood estimate for a particular choice of /(x, 8). 
In this connection the following remark due to Doob is of interest, l^et Assump¬ 
tions 3' and 5' be the same as 3 and 5, respectively, except that the exceptional 
set is permitted to depend on the sequence 8i . If 3' and 5' hold for one choice of 
/(x, 8)y they also hold for any other choice. Doob has shown that Assumptions 3' 
and 5' insure the existence of a choice of/(x, 8) for which Assumptions 3, 5 and 8 
hold. Thus, one may say that Assumptions 3' and 5' arc the essential ones and 
the stronger assumptions 3, 5 and 8 are needed merely to excilude a “bad’^ 
choice of /(x, 8). 

2. Some lemmas. In this section we shall prove some lemmas which will be 
used in the next section to obtain the main theorems. Let be the true parameter 
point. By the expected value Eu of any chance variable u we shall mean the 
expected value determined under the assumption that is the true parameter 
point. For any chance variable u, u' will denote the chance variable which is 
equal to u when w > 0 and equal to zero otherwise. Similarly, for any chance 
variable u, the symbol n" mil be used to denote the chance variable which is 
equal to u when u < 0 and equal to zero otherwise. We shall say that the expected 
value of u exists if Eu' < <». If the expected value of v! is finite but that of u" 
is not, we shall say that the expected value of u is equal to — oo. 

Lemma 1. For any 8 9 ^ 8q we have 

(1) ElogSi.Xy8) <Elogj{Xy8,) 

where X is a chance variable with the distribution F(x, ^ 0 ). 

Proof. It follows from Assumption 2 that the expected values in (1) exist. 
Because of Assumption 6, we have 

(2) E\logf{Xy8,) \ < X. 

If JE?log/(X,ff) = - 00 , Lemma 1 obviously holds. Thus, we shall merely consider 
the case when E logfiX, 6) > — Then 

(3) £^|log/(X,fl)| < «. 

Let u = log f{X, 6) — log fix, 9o) * Clearly, | « ] < ■». It i known that for 



598 


ABRA.HAM WALD 


any chance variable u which is not equal to a constant (with probability one) 
and for which E\u \ < <», we have* 

(4) Eu < log Ee'‘. 

Since in our case 

(5) £?e“ g 1, 

and since u differs from zero on a set of positive probability (due to Assumption 
4), we obtain from (4) 

(6) Eu < 0. 

Thus, Lemma 1 is proved. 

We shall now prove the following lemma. 

Lbb(ma 2. lim E log/(AT, 0, p) = E logf(X, 0). 

p«"0 

Proof. Let f*{x, 0, p) = /(x, 0, p) when f{x, 0, p) ^ 1, and = 1 otherwise. 
Similarly, let/*(x, 0) = /(x, 0) when/(x, 0) ^ 1, and =1 otherwise. It follows 
from Assumption 3 that 

(7) lim log/*(x, 0, p) = log/*(x, S) 

p«0 

except perhaps on a set whose probability measure is zero. Since log /*(x, 0, p) 
is an increasing function of p, it follows from (7) and Assumption 2 that 

(8) lim E log /*(X, 0,p) = E log /*(X, 0). 

^-0 

Ijet/**(x, 0, p) = /(x, 0, p) when/(x, 0, p) ^ 1, and = 1 otherwise. Similarly, let 
f**(x, 0) = /(x, 0) when/(x, ^ 1, and =1 otherwise. Clearly, 

(9) I log/**(a;, p) I ^ I \og f**ix, 0) I 
and 

(10) lim log /**(x, 0, p) = log/**(x, 0) 

p-0 

for all X except perhaps on a set whose probability measure is zero. The relation 

(11) lim £: log 0,p) = E log f**{X, 0) 

follows from (9) and (10) in both cases, when E logf**{X, 0) is finite and when 
E logf**(X, 0) — — 00 , Lemma 2 is an immediate consequence of (8) and (11). 
Lemma 3. The equation 

(12) lim E log <p(X, r) = — oo. 

r-oo 

holds. 


‘ It is of no consequence what value is assigned to u when/(a;, 6) or/(x, ^o) ia zero, since 
the probability of such an event, because of (3), is zero. 

® This is a generalization of the inequality between geometric and arithmetic means. 
See, for example, Hardy, Littlbwood, Polya, Inequalities, Cambridge 1934, p. 137, The¬ 
orem 184. 



MAXIMUM LIKELIHOOD ESTIMATE 


5 ^ 


Proof. It follows from Assumption 5 that 

(13) lim log v»(a:, r) = — «, 

for any z (except perhaps on a set of probability 0). Since according to Assump¬ 
tion 2, 

(14) £;iog^*(Z,r)< 

and since log ip{x, r) — log <p*(x, r) and log<p*ix, r) are decreasing functions of 
r, Lemma 3 follows easily from (13). 


3. The main theorems. We shall now prove the following theorems. 
Theorem 1. Let « he any dosed subset of the parameter space fl which does not 
contain the true parameter point 6o. Then 


(15) 


prob. 



Snpfix,,e)fix,,e) ■■■fix„,e) 

6 eoi 

f(X, , eo)/(X, ,0o) ■■ rfiXn, Bo) 



== 1 . 


Proof. Let ro be a positive number chosen such that 
(16) E log <p{X, ro) < E \ogf{X, do). 


The existence of such a positive number follows from Ijemma 3. Let «i be the 
subset of w consisting of all points 0 of w for which | 0 | g ro. With each point 6 
in wi we associate a positive value p» such that 

(17) ' E log fix, e, pe) < E log f{X, do). 


The existence of such a p« follows from Ijcmmas I and 2. Since the set ui is 
compact, there exists a finite number of points 0i, • • • , in wi such that 
SiOi, pt,) -!-•••+ S{dh , ptk) contains coi as a subset. Here S(d, p) denotes the 
sphere with center $ and radius p. Clearly, 


0 ^ Sup f{xi ,e) • • • fix„ , 0) ^ S fixi, 0i , po.) • • • /(.r„ , Bi , p,f) 

9tua i—1 

-f ^(xi, ro) • • • tpiXn , ro). 


Hence, Theorem 1 is proved if we can show that 
(18) prob 




and 


prob 



<f{Xi > fo) 

fiXi, Bo) 


<p(Xn, ro) 
f(Xn,Bo) 



(19) 



600 


ABRAHAM WALD 


The above equations can be written as 

(20) S ~ 


and 

(21) probilim 2 [log <p(X„, ro( - log/(A’’„, tfo)] = - wj = 1. 

^n«>oo a“l j 

These equations follow immediately from (16), (17) and the strong law of large 
numbers. This completes the proof of Theorem 1. 

T'Rborem 2. IjCt d„{x\, • • • , z„) be a function of the observations xi, •••,*„ 
such that 

(22) = c > 0/or all n and for allxi, • • • , n. 
f(xi jOo) • • • fix „, ffo) 

Then 

(23) prob {lim ^„ = 0o} — 1. 



(f = I, ,h) 


Proof. It is sufficient to prove that for any € > 0 the probability is one that all 
limit points 9 of the sequence {^n} satisfy the inequality | ^ — | ^ e. The 

event that there exists a limit point 6 of the sequence {$„} such that | S — ffo I > « 
implies that Sup/(a:i ,5) • • • /(x„, 0) ^ f(xi, 9„) • • • f(x„ , 9„) for infinitely 

many n. But then 


(24) 


Sup f(Xi,0) ••• fiXn,e) 
|»-»ol£« __ 

/(*! , »o) • • • f{x„ , do) 


^ c> 0 


for infinitely many n. Since, according to Theorem 1, this is an event with 
probability zero, Ave have shown that the probability is one that all limit points 
9 of {ff„) satisfy the inequality \ 9 — 9o\ ^ e. This completes the proof of 
Theorem 2. 

Since a maximum likelihood estimate d«(xi, • • • , Xn), if it exists, obviously 
satisfies (22) with c = 1, Theorem 2 establishes the consistency of fl„(xi, • • • , x„) 
as an estimate of 9. 


4. Remarks on possible generalizations. The method given in this note can be 
extended to establish the consistency of the maximum likelihood estimates for 
certain t 3 q)es of dependent chance variables for which the strong law of large 
numbers remains valid. 

The assumption that the parameter space 12 is a subset of a finite dimensional 
Cartesian space is unnecessarily restrictive. I^et 12 be any abstract space. All of 



ON WALD’S PROOF OF CONSISTENCY 


GOl 


our results can easily be shown to remain valid if Assumptions 3, 15 and 7 are 
replaced by the following one: 

Assumption 9. It is possible to introduce a distance &{di, $ 2 ) in the space Q such 
that the foUomng four conditions hold: 

(i) The distance d(0i, 0i) makes Q to a metric space 

(ii) lim/(a:, Oi) = f{x, 0) if lim 0, = 0 for any x except perhaps on a set which 

t—eO immlO 

may depend on d {but not on the sequence B^) and whose probability measure is zero 
according to the probability distribution corresponding to the true parameter point Bo . 

(iii) If Bo is a fixed point in Q and lim h{Bi , ft)) = cc, (hen lim /(.r, Bi) = 0 

i -"06 immdO 

for any x, 

(iv) Any closed and bounded subset of Q is compact, 

REFERENCES 

[1] J. L. Doob, “Probability and statistics/* Trans. Amer. Math. Sac., Vol. 36 (1934). 

[21 H. Cramer, Mathematical Methods of StalisticSf Princeton University Press, Princeton, 
1946. 

[3] V. S. Huzurbazar, “The likelihood equation, consistency and the maxima of the likeli¬ 
hood function,” Annals of Eugenics, Vol. 14 (1948). 


ON WALDOS PROOF OF THE CONSISTENCY OF THE MAXIMUM 
LIKELIHOOD ESTIMATE 

By J. Wolfowitz 
Columbia University 

This note is written by way of comment on the pretty and ingenious proof of 
the consistency of the maximum likelihood estimate which is due to Wald and is 
printed in the present issue of the Annals. The notation of this paper of Waldos 
will henceforth be assumed unless the contrary is specified. 

The consistency of the maximum likelihood estimate is a “weak” rather than 
a ^‘strong” property, in the technical meaning which these words have in the 
theory of probability, i.e., it is a property of distribution functions rather than of 
infinite sequences of observations. Prof. Wald actually proves strong convergence, 
which is more than consistency. His proof uses the strong law of large numbers, 
and he remarks that his method “can be extended to establish consistency of the 
maximum likelihood estimates for certain types of dependent chance variables 
for which the strong law of large numbers remains valid.” Below we shall use 
Wald's lemmas to give a proof of consistency which employs only the weak law 
of large numbers. Not only does this proof have the advantage of being expedi¬ 
tious, but it can be extended to a larger class of dependent chance variables. 

The consistency of the maximum likelihood estimate follows from the following 

Theorem. Let n and e be given, arbitrarily small, positive numbers. Let S(Bo , v) 
be the open sphere with center Bo and radius rj, and let Q{ri) = 12 — S(6o, v). Let 



602 


j. woLFOwrrz 


WaWs Assumpiiom 1-8 hold. There exists a number h{rj), 0 < h < 1, and another 
positive number N{vi, i) such that, for any n > N(rj, e), 


Poi 


sup 

Be Q (ri) 


n 


n/(^.-, e) 

i-1 


n/UMflo) 


> h" 


> < • 


where Pq is the probability of the relation in braces according to f{x, Bq). 

Proof: Proceed exactly as in the proof of Wald’s Theorem 1 and obtain ro, 
Pm , • • • , PBky so that the set theoretic sum of the open spheres S{Bi , p®*), i = 1, 
2, • • • , /i, covers the compact set which is the intersection of Q{v) with the 
sphere 1 ^ | < ro. Define TiBi), i = 1, • • • , A + 1, as follows: 


-2r(P,) - JS? log / (X, Bi , pe,) - E log/(X, B,) 


(i = 1, • • • , A) 


= E log ^(X, ro) - E log/(X, B,), 


If any of the right members above are infinite let T{Bi) be one, say. Thus all 
ir(Pi) are positive. Applying the weak law of large numbers we have that, for any 
i such that 1 < t < A + 1, there exists a positive number Ni such that, when n > 

Niy 


j_ 

iifix,, flo) 


exp i-nT{fii)) 


> 


h + 1 


a = 1, 


, h) 


n viX,, ro) 

Ji _ 

n f(Xi,eo) 


> exp i-7iT{Bh+i)) > 


> 


€ 

A "f* 1 


From this the theorem follows immediately, with 

X(rj, c) = max Ni 

i 

= max exp {— T{di )}. 

t 

The author is obliged to Prof. Wald for his kindness in making his paper 
available to the author. 



RANDOM WALK 


603 


A NOTE ON RANDOM WALK 

By Herbert T. David 

The Johns Hopkins University Institute for Cooperative Research 

A random walk is defined as a series of discrete steps along the real line, here 
denoted by I. Each step is represented by the chance variable X, with sectionally 
continuous density function f{x). The walk begins at any point a of I, and 
continues until a step carries us outside some subregion Q of I. In this note, is 
taken as a finite interval with upper bound D and lower bound D — y. The 
chance variables N and Z are, respectively, the number of steps required to end 
the walk, and the endpoint of the walk. The range of Z always excludes il. 

Below, we define x = D — a, and consider E(N) as a function G{x, y) of x 
and y. Under specified conditions, a differential equation (32) is derived, relating 
G(0, y) and G{x, y). 

Let 


(1) 

4'iit) = fit - a) 



UO = / (n - 1) ••• Jnf(ffi) 


(2) 

/(t-a-"^^ dgt • • • dg^.i ; 

n > 1 

where 

+ 21 S'j j for f: 1, 2, • • • , n — 1. 


Then 

P{Z(Wi,N = n] = f \pn(t) dt 

Jwi 

for W)i« Q 


P{ZfWi,N = n} = 0 

for Wi f Q. 

Hence 

P{N =«!=/’ 'f'niOdt 


(3) 

•'o 

00 - 



E{N=)i Ef UDdt. 

,~iJa 



The transformation [Aj = o + gi,-; t: 1, • • • , n — 1] gives for 
the more convenient expression 

• • • {n - 1) ••• jj(hi - a) 

• n/(^- - hi-i)fit — h„-i) dhi.--‘ dhn-i. 


(4) 



604 


HERBERT T. DAVID 


The n-fold integral j ^n(t) dt is absolutely convergent, hence may be inte¬ 
grated first with respect to t. This gives, keeping the notation of (4) 

( 5 ) j iffniO ~ Jq 4'n-l(hn~l) dkn-l . 

Assuming that E(N) remains finite for all considered a and 0, series (3) may be 

00 

rearranged, giving : E(N) = ^ Bi where 

t-I 

Bi = E U,it)dL 

J-i Jq 

00 

Now, Bi = E -Pf A" = t) = 1. Also, using (5) and induction on n, it is readily 

4 — 1 

shown that Bn — dt, so that 

(6) E{N) = l + H Ut)dt 

>-i Jo 

Define transformations r„ : [j/( = D — /i,-, i : 1, • • • , n — 1; = D — /]. 

Substituting expressions (1) and (4) in (6), transform the jth term of the sum¬ 
mation by Tj. This gives 

(7) E{N) = 1 + E / •••(«)••• f fix - gi) - gi+i) dgi • • • dgn 

where x — D — a. 

By (7), EiN) is a function of x and y, hence we write E{N) = G{x, y). 
Define: 

Mik) :Max/(0for |f| < k. 

K : Any number satisfying < [1 — t]/M {K). 

R : Any region [— oo <a;< <»;0<y< K\. 

M : Max/(<). 

L : Any number satisfying L < [1 — f]/M. 

R' : Any region [— « < x < *; 0 < y < L]. 

In the ensuing argument, we shall assume that 

(8) (x, y) e R. 

This condition restricts certain one-dimensional and two-dimensional variables 
to regions over which some infinite series are uniformly convergent with respect 
to these variables. Uniform convergence is required to validate term-by-term 
differentiations and integrations, and to establish the continuity in one or two 
variables of certain functions represented by series. 

Arguments dealing with the solution of integral equations (17), (20) and (25) 
are valid only under the more restrictive condition 



RANDOM WALK 


605 


(9) (*, y) « R’ 

this being the general sufficiency condition for the existence of solutions. How¬ 
ever, (17) and (20) enter the argument with respect only to the derivation of 
equation (21) which could have been derived, though in a more cumbersome 
manner, by a term by term comparison of the series expressions for [Xoi(x, j/)] 
y)] and for [Goi(a:, y)] [X(y, j/)], this latter approach being valid under (8). 
Similarly, (26) is used only in obtaining (27), which could have been obtained 
by a direct manipulation of the series expression for 0{x, y), this approach also 
being valid under (8). Hence, all subsequent derivations hold, as long as {x, y) eR 
By (8), we may interchange summation and integration with respect to gi 
in (7). This gives 

(10) G(x, y) = 1 +( fix - g)Gig, y) dg. 

Jo 

(11) Assume that /(<) has a continuous derivative everywhere 
Then f{t) is continuous and G(x, y) is continuous by (7) and (8). Hence 

(12) Six - g)Gig, y) and djdxfix — g)Gig, y) are continuous in (x, g) 

(13) Six - g)Gig, y) is continuous in ig, y). 

Let Gijix, y) denote 


Then, by (12), we may differentiate (10) with respect to x, and, since 
/jo(x — g) = Soiix — g), an integration by parts yields 

(14) Gioix, y) = /(x)G(O, y) - /(x - y)Giy, y) + f Six - g)Gioig, y) dg. 

Jo 

Further, under (8), Goi(x, y) may be obtained by differentiating (7) term by 
term, and is continuous in (x, y). Hence,/(x — g)Gmig, y) is continuous in ig, y), 
and we may differentiate (10) with respect to y, giving 

(15) Goi(x, y) = Six - y)Giy, y) + f Six - g)Goiig, y) dg. 

Jo 

Adding (14) to (15), dividing by G(0, y) which is always greater or equal to 1, 
and letting 


( 16 ) 

we obtain 


X(x, y) = [Gio(x, y) 4- Goi(®, j/)]/f7(0, y) 



606 


HERBERT T. DAVID 


(18) 


Under (9), (17) defines a function 

y) == /(*) + H f • • • (n) • • • f f(x~gi) 

n-1 Jo Jo 


JjAffi - gi+i)f(gn) dgi • • • dgn. 

t-1 


By (8), this function is continuous in (x, y) and may be differentiated term by 
term with respect to y. Further, Xoi(x, y) thus gotten is continuous in (x, y), so 
that/(x — (/)Xoi(p, y) is continuous in (jg, y). Hence, (17) may be differentiated 
with respect to y, giving 

(19) Xoi(x, y) = /(x - y)\{.y, y) + f fix - g)\oi(g, y) dg. 

Jo 

Since, under (9), the integral equation 

(20) a(x, y) = /(x - y) + f f(x - g)a(g, y) dg 

Jo 

has a unique continuous solution for every fixed y, (16) and (19) give 


( 21 ) 

Hence 


Xm(x, y) ^ Goiix, y) 
Hy, y) Giy, y) ’ 


f Xoi(x, y) dx f Goiix, y) dx 
Jo Jo 


dx 




X(y, y) 


Giy, y) 

and 






d 

dy 

f X(x, y) dv 
Jo 

"«l-t 

II 

r G(x, y) 
Jo 



Hy, y) 


Giy, y) 

(23) 


Let fit) 

= /(■ 

-t). 


Then it is obvious from the definition that 
(24) G(0, y) = Giy, y). 

Further, by (15), 

Goiix, y) 


(25) 


Oiy, y) 


fix - y) + 


f fix - g) 
yo 


Goiig, y ) 

Giy, y) 


so that, under (9), (25) gives for Goi(x, y)/G(y, y) the unique expression 

fix -y)+ f in) f fix - gi) II/(fli< - gi+i)fig, - y) dgi 

n-l Jo Jo »-l 


dgn 


which, by (23), is equal to 



RANDOM WALK 


607 


/(y - a:) + E j[ •••(«)••• f^iy - g„) n/(?,+i - gdfigi - x) dgx ■ ■ ■ dg.. 

Since, under (8), we may interchange summation and integration with respect 
to X, it follows that 


(26) 


r On{x, y) 
Jo G{y,y) 


dx 


f /(y — x) dx + E / • • • (n + 1) • •. 

•'0 n—1 •'0 

aV n—1 

■ / f(y ~ gn) n f(gi+i — gi)f(gi — x) dgi dg„ dx 

•'O t—I 


which, by a change of integration indices and a referral to (7), is seen to equal 
[Oiy, y) — 1]. (26) thus gives 

(27) r Goi(x, y) dx = Giy, y)[a{y, y) - 1]. 

Jo 


Further, by (16), (24), and (27), 


(28) 

f X(x, y) dx = 6(0, y) - 1 
Jo 

so that 


(29) 

|£x(,,»)dx-|c(0,») 

while (24) and (27) also yield 

(30) 


Hence, by (22), (29), and (30), 

(31) 

My, y) = i 6(0, y)/G(0, y). 

dy 


Finally, substituting (31) in (21), and remembering the definition of X given in 
(16), we get, using (24), 

(32) 6X0, y)lGn(x, y) + Gm{x, y)\ = ^ 6(0, y)[Gio{x, y) + 26oi(x, y)]. 

ay 

The conditions under which (32) holds are, in summary, (8), (11), and (23). 
If /(() has an expansion 

(33) /(0 = E*4i/‘; 1<|<2’ 

»-0 

it is clear from (7) that 

(34) G{x,y) = E Biix'y’ 

uj—0 

for (x, y) e S, where S :[To < x < TuO < y < Ti + To]; To <0,Ti< T. 



608 


ROBEBT E. GREENWOOD 


Substituting (34) in (32), and equating coefficients of like powers of (x, y), 
we obtain the recursion formulae 

(35) 2 —j +1]= 2 + l][i — A:]; f:0, 

From (10), it is readily verified that = 0 for i 9 ^ 0, so that equations (35) 
give solutions for the Bij in terms of the Bok . These solutions are of interest 
since they show a one-to-one correspondence between the functions G(0, y) 
and G{x, y), for (x, i/) «[72 fl S]. 


NUMERICAL INTEGRATION FOR LINEAR SUMS OF EXPONENTIAL 

FUNCTIONS 

By Robert E. Greenwood 

The University of Texas and the Institute for Numerical Analysis^ 

1. Introduction. The methods of numerical integration going by the names 
trapezoidal rule, Simpson’s rule, Weddle’s rule, and the Newton-Cotes formulae 
are of the type 

( 1 ) f f(x) da; 23 X.»/(a;.n) 

t-0 

where the abscissae {xin} are uniformly distributed on a finite interval, chosen 
as ( — 1, 1) for convenience, 

(2) Xtn = -1 + i = 0, 1, 2, • • • , n, 

n 

and where the set of constants {X<n} depend on the name of the rule and the value 
of n but not on the function /(x). Throughout this note all abscissae will be 
assumed to be uniformly distributed on ( — 1, 1) unless the contrary is explicitly 
stated. 

Since correspondence relation (1) involves (n + 1) constants {Xtn}, it might 
be possible to choose (n + 1) arbitrary functions gj{x), j = 0, 1, 2, • • • , n, 
and require that the set {Xtn} be the solution, if such exists, of the (n + 1) 
simultaneous linear equations 

ti 

(3) / gAx) da; = 23 >^ingi(xi„), j = 0, 1,2, ••• ,n. 

J-1 i-O 

Indeed, the selection 

(4) gAx) = x\ j = 0,1, 2, • • • , n, 

will give a set of (b + 1) simultaneous equations of form (3) and the solution {X,„} 
is the set of Newton-Cotes weights for that value of n. The numerical evaluation 

^ This work was performed with the financial support of the Office of Naval Research of 
the Navy Department. 



NUMERICAL INTEGRATION 


609 


of {Xtn} is best accomplished by other and more sophisticated methods, how¬ 
ever.^ 

Because of linearity in both the integral and the finite summation, once the 
constants {X,n} have been determined for a specific set of functions 
correspondence relation ( 1 ) is exact for any linear combination of that funda¬ 
mental set. Thus, for example, for the fundamental set (4), correspondence 
relation (1) with the appropriate values {X,„} is exact for all polynomials of 
degree less than or equal to /?. 

Although tradition favors the set of functions (4), there is nothing compelling 
about such a selection. Indeed, two other possible choices might be 


(5) 

g,{x) = e’% 

j = 0,1,2, ■■ 

• , n 

and 




( 6 ) 

a,ix) = 




j = —rw, —m + 1, • • • , 0, 1, • • • , -- 1, n?; n = 2m. 

These choices would seem to be appropriate whenever numerical methods are 
being applied to exponential growth curves or exponential decay curves. 

2. Use of the basic set gj{x) = If integration relation (1) be made exact 
for the set W},j = 0, 1, • • • , n with evenly spaced x abscissae, the set (3) of 
(n + 1) simultaneous linear equations in the unknowns {Xi^i}, ^ = 0, 1, • • • ,n 
is obtained. Call the solution of this system {uinl, solution values for n = 1,2, 
3, 4, 5, 6 are tabulated below. 

For the symmetric case where integration relation (1) is made exact for 
= — m, — m + 1, • • * , m — 1, m; n = 2m, a similar but different set of 
linear equations (3) results for the unknowns {X,nK Call the solution of this 
system {bin}. As implied above, only even values of n are used in order to preserve 


the symmetry, and values of { 6 ,„} 

are tabulated below for n = 2, 4 

, 6 . 

n — 1, 

Goi = 

1.31303 

5285 




an = 

0.68696 

4715 



n = 2, 

Oqo = 

0.21805 

032+ 

5o2 = 0.32260 

623" 


Gi2 == 

1.49780 

742 

6,2 = 1.35478 

755 


Cl 22 — 

0.28414 

226" 

622 = 0.32260 

623" 

n = 3, 

Oqz = 

0.51324 

284 




Ol8 = 

0.22445 

055 




028 = 

1.08155 

527 




Oss = 

0.18075 

134 



11 

Oo4 = 

-0.13716 

639+ 

6*4 = 0.15048 

171 


Oi4 = 

1.40098 

548 

6,4 = 0.73243 

318 


* Whittaker and Robinson, The Calculus of Observationsy 4th Edition, (1946), London, 
pp. 152-156. 



610 


ROBERT E. GREENWOOD 


Ou -0.30895 914 bu » 0.23417 022 

au = 0.91710 903 5s4 => 0.73243 318 

Ou = 0.12803 103" bu = 0.15048 171 

n = 5, ooi = 0.68919 3 

o« = -1.07644 3 
On = 2.12534 6 

flu ^ —0.63595 6 
fl«6 = 0.79933 8 

om = 0.09852 18 

n = 6, fl,» = -0.83607 

ow = 3.54128 

Om = -3.88102 
flu »= 3.32254 

046 = -0.94685 
Oh = 0.72075 

flu = 0.07937 5+ 

The computing service of the Institute for Numerical Analysis has supplied the author 
with most of the coefficients tabulated above. 

3. Estimates of the error term. The choices of the coefficients {oin} and 
j6i») are such that integration relation (1) is exact whenever 

(7) /(*) = .4o + AiC* + • • • + .Anc"* and X,„ = fl<„ , 
and whenever 

(8) fix) = fi-.e-"* + + ... + 5, + ... + and X^n = 6<« . 

When fix) is not of these prescribed forms, the error in using correspondence (1) 
may be of some importance. By making the transformation 

(9) M =■ «*, fix) = /(log u) = giu) 
integration relation (1) becomes 

(10) f giu)^ flw S Xi„^(u4„) 

j.-l Mi-0 

where the {ui.} are not evenly distributed. By approximating giu) by its Taylor’s 
series with a remainder term, the following expressions for the error in using 
corre^ondence (1) can be obtained; 

Using the coefficients {fli*}, 

(u) Error < [2 + g I-*, l] 

and, using the coefficients {bin}. 


boo = 0.09443 5 
bit = 0.53464 7 
bu = 0.01139 3 
bu = 0.71905 0 
bu = 0.01139 3 
bu = 0.53464 7 
bu = 0.09443 5 



NTJMERICAL INTEGRATION 


611 


( 12 ) 


Error < 


- iV"-^* 

2 e / f e*" — e~" 
( 2 m + 1)1 L 





Neither of these error expressions can be said to be very practical in actual 
computation, and neither appears suitable for establishing convergence proper¬ 
ties of the type 


(13) 


n -1 

lim 23 Xin/(a:,n) = / fix) dx. 

t—0 J-1 


However, both (11) and (12) reduce to zero when/(x) is of the form prescribed 
by (7) or ( 8 ) respectively. 


4 . Numerical examples. As illustrative numerical examples, the case n = 4 
was selected and several typical functions were integrated approximately by the 
positive power exponential rule, the symmetrical exponential rule and the 
Newton-Cotes formula, 

j^J(x) dx = x\[7fi-l) + 32/(-J) + 12/(0) + 32/(J) -b 7/(1)]. 

Values of {oa} and { 6 < 4 } are given in the tables in part 2 . The tjrpical functions 
used were x*, e**, l/(x -|- 3), e”* , xe®, x®, and e®’®*. The following results were 
obtained: 


Function 

Positive Power 
Expontential 

Symmetrical 

Exponential 

Newton-Cotes 

8 Decimal 
Approximation to 
Exact Value 


.5703 

8827 

.6671 

8001 

.6666 

6666 

.6666 

6667 

6 ** 

3.6268 

6044 

3.6268 

6041 

3.6317 

3108 

3.6268 

6041 

l/ix + 3) 

.6828 

6353 

.6931 

5792 

.6931 

7460 

.6931 

4718 

e-** 

1.4930 

1396 

1.4857 

2754 

1.4887 

4582 

1.4936 

4827 

xe* 

.7292 

4338 

.7353 

6007 

.7361 

7480 

.7357 

5888 

X® 

.0270 

8487 

.3238 

5196 

.3333 

3332 

.2857 

1429 

e®-®* 

4.0527 

7287 

4.0530 

7585 

4.0607 

7415 

4.0519 

1379 


From this tabulation, it would appear that the symmetrical exponential 
method compares favorably with the Newton-Cotes method for such typical 
functions as l/(x + 3), xe*, x*, and c* -*. Note that the choice of x* or e** 
is not really a fair choice when comparing these two methods, since Newton- 
Cotes is derived so as to give exactness for x® and the sjmmetrical exponen¬ 
tial so as to give exactness for e®*. 



612 


ARTHUR SARD 


SMOOTHEST APPROXIMATION FORMULAS 

By Arthur Sard^ 

Queens College 

Introduction. Consider a process of approximation which operates on a 
function x = x(t). The error in the process may be thought of as a sum R + 5A, 
where R is the error that would be present if x were exact and SA is the error due 
to errors in x, (Precise definitions are given below.) Suppose that one wishes to 
choose one process A from a class (i of processes. In some situations it is appro¬ 
priate to base the choice on R alone^; in others it is appropriate to consider 8A. 

The primary purpose of the present note is to formulate a criterion of smoothest 
approximation: That A in (S is smoothest which minimizes the variance of 
8A. A criterion based on both R and 8A is also suggested. (Sections 1 and 2.) 
Smoothest approximate integration formulas of one type are derived in Section 3. 

Progress in the technique of estimating the covariance function of the errors 
in X will lead to further applications of the criterion of smoothest approximation. 

1. Approximation of a functional. Suppose that X is a space of functions 
X = x(t) each of which is continuous on a ^ t ^ b. Let f[x] be a functional 
defined on X; that is, f[x] is a real number defined for each x eX. For example, 
X might be the space of functions with second derivatives on [a, b] and/[a:] might 
be x''{u)f where u is a fixed number in [a, b]. 

Suppose that/[x] is to be approximated by a Stieltjes integral 

(1) ^ ^ f da(0, X e Xj 

Ja 

where a is a function of bounded variation. The remainder in the approximation 
of S[x] by A is 

A- f[x]. 

If the approximation (1) operates on a; + instead of x, the result is A + 

/ (* + &c) da; and the error in the approximation of/[*] by .d. + 5A is 12 + SA, 

where 

(2) SA = I Sx(t) daO). 

Ja 

Consider a class (3 of approximations A, each of the form (1). We shall propose 
a criterion for characterizing the ‘‘smoothest A” in (3, relative to the covariance 
function of the errors 8x. 

* The author gratefully acknowledges financial support received from the OflBice of Naval 
Research. 

< '^Best approximate integration formulas; best approximation formulas,’’ Amer, Jour, 
of Math,, Vol. 71 (1949), pp. 80-91. 



APPROXIMATION FORMULAS 


613 


Assume that dx = 6x{t) is a stochastic process with mean zero* and covariance 
function <r(^, u) = E[8x(t)8x{u)], Then, by (2), 8A is a stochastic variable; and^ 


EdA 


(3) 


= E f 8x da f 0 da = 0, 

E{8AY = t; = 8x{i) da{i) J Sx(u) da(u) hO! (r(tyU) da(t)da(u). 


Criterion. That A (if any) in(d is smoothest which minimizes the variance 
V of 8A. 

In particular cases, this criterion (least squares) has been proposed and used 
by Chebyshev and others. An application to approximate integration is given in 
section 3 below. 

One may extend this discussion to cases in which the approximations A 
involve derivatives of x. 

Remark. The criterion of best approximation* may be combined with the 
above criterion of smoothest approximation as follows: That A (if any) in (2 is 
the best compromise which minimizes a specified combination of the variance 
of 8A and the modulus of R. Here it is assumed that the remainders R satisfy the 
conditions for the existence of the modulus.* 


2. Approximation of a function. One may extend the preceding discussion 
to the case in which y = f[x] is an operation to a space of functions y = y(u)^ 
d ^ u ^ b) and in which the approximation of f[x] is 

A = x{t) dt a(ly u), x € X, 

Ja 

where, for each u, a is a function of bounded variation in t. Then, for each u, 
8A has a variance v{u). Criterion. That A (if any) in a class of approximations is 
smoothest which minimizes v(v) for all u\ failing such an A, that A (if any) is 
smoothest which minimizes the integi'al of v(u), or alternatively, the supremum 
of v(n)y over a ^ u ^ b. 

3. Smoothest approximate integration formulas in a particular case.^ Let m 

and n be fixed integers; m ^ I, n ^ 0. LetQ — (im,nbe the class of all approxima¬ 
tions of 

® The essential point here is that Eh{t) - 7n{t) be known for each t \ for given m(t), one 
could and would replace x + dx by or -f — w. 

* We assume here that the integrals in (3) exist and that the inversions of E and fda 
are valid. For this it is sufficient that be integrable relative to the product measure aw 
for all functions a corresponding to elements of Q, where w is the measure in the underlying 
probability space relative to which E is the operator / dw. Cf. J. L. Doob, ‘Trobability in 
function space,** Bull. Amer. Math. Soc.j Vol. 53 (1947), especially pp. 26, 27. 

‘ The approximate integration formulas of this section are of such a nature that one 
would expect them to be known. The values of J at the end are probably new. 



614 


ARTHUR SARD 


/ m/2 

x(t) dt = f[x] 

m/2 

of the form 

tnl2 

A = 'Z, hxd), 

iwm--ml2 

the m + I constants bi being such that A — /[*] whenever x{t) is a polynomial of 
degree n. Throughout this section i is to range over the m + 1 values i =■ —ml2, 
—m/2 + !,•••, + m/2. Suppose that the errors Sx{i) are independent, with 
common variance <r*, and with mean zero. Then a(t) is a step function with jumps 
at < = i; and 

» = <r* 2 • 

i 

The smoothest approximation in Ct„,n is the one for which v is a minimum . 
(The m + 1 variables 6< in v are subject to n + 1 constraints due to the condition 
that the approximation be exact for degree n. The set 3m,n is ‘empty if and only 
if m is less than the largest even integer contained in n.) 

If n = 0 or 1, the smoothest formula in 3m.n is the one for which all the 
coefficients are equal: 

bi == m/(m 4- 1); 

in which case 

V = mV/Cm + 1). 

If n = 2 or 3, the smoothest formula in 3m,n is characterized by the following 
relations: 

6i = Xo + , 

Xo = m(2m* + 9ni — 6)/2{m — l)(m + l)(w + 3), 

Xi = —30wt/(m — l)(wi + l)(w -1- 2)(m + 3); 

in which case 

w/ff* = XoWi "I" Xi?n*/12. 

Thus, the smoothest approximation in 3g,2 or in 3«,t is the following: 

A = i[*(-3) + a:(3)] + |[x(-2) + x(2)] + ii[x(-l) + x(l)] + f x(0). 

By the method of Lagrange’s multipliers, one may establish the following 
relations for the smoothest formula in 3m,n. Here i has the same range of values 
as before; m and » range over 0,1, • * ■ , [n/2]. 

bi-ZxA 



APPROXIMATION FORMULAS 


615 


where 

c, = m^'‘+V4'‘(2M + 1), 
and are determined by the equations 

M i 

The class G«,» is such that for each A e there is a function k(t) with the 
following property:* 


R 


/ m/2 

a;‘"+”(0fc(0 dt, 

mil 


whenever a: is a function with continuous (n + l)th derivative. The quantity 

/ mil 

k\t) dt 

m/1 

is useful in appraising R, since 

/ m/l 

dt, 

m/1 


by Schwarz’s inequality. 

Values of J for the smoothest formulas are as follows. 


n == 0 : J — mV6(m + 1). 
n = 1 : J = + 2m + l)/360(m + 1). 


For n « 2 and 3, and m ^ 6, the numerical values of J are as follows. 


m J 

(n - 2). 

2 1/1,890 

3 11/8,960 

4 134/33,075 

5 1,865/150,528 

6 8/245 


J 

{n - 3). 
1/9,072 
13/17,920 
62,539/13,891,500 
136,223/6,322,176 
6,683/82,320 


For the method of calculation of J, as well as the transformation of J under a 
linear transformation of t, the reader may consult the paper*. 



616 


JOHN E. WALSH 


ON THE POWER FUNCTION OF THE ^3EST” NTEST SOLUTION OF THE 
BEHRENS-FISHER PROBLEM 

By John E. Walsh 
The Rand Corporation 

1. Introduction. The Behrens-Fisher problem is concerned with significance 
tests for the difference of the means of two normal populations when the ratio 
of the variances of the populations is unknown. Denote one population by 
N{ai , a\) and the other by AT( 02 , (t\)^ where the notation AT(a, <r^) represents a 
normal population with mean a and variance cr*. Let m sample values be drawn 
from N{ai , er?) and n sample values from AT( 02 , < 72 ) where m < n. Then Scheff^ 

[1] has shown that certain optimum properties are possessed by a <-test solution 
he proposed for the Behrens-Fisher problem, in which the numerator of t is based 
on the difference of the means of the samples while the denominator is based on 
the square root of a function of the sample values which has a x^-distribution 
with m — 1 degrees of freedom. The purpose of this note is to compare the power 
function of this ^test with the power function of the corresponding most powerful 
test for the case in which the ratio of variances ai/al is also known (only one¬ 
sided and symmetrical tests are considered). This comparison is made by com¬ 
puting the power efficiency (see section 2 for definition) of Scheff^^s test. 

It is sufficient to limit power efficiency investigations to one-sided tests. As 
shown in [2], a symmetrical ^-test with significance level 2a has the same power 
efficiency as the corresponding one-sided /-test with significance level a. Equation 

(2) of section 2 furnishes an explicit formula whereby approximate power effi¬ 
ciencies can be computed for a wide range of values of a, m, n. Table 1 contains 
values of (2) for a = .05, .01 and several values of m and n. 

For the situation considered here, a power efficiency of 100r% has the quantita¬ 
tive interpretation that the given test based on samples of size m and n has 
approximately the same powder function as the corresponding most powerful 
test based on samples of size rm and m. Intuitively the power efficiency 
of a test measures the percentage of available information per observation 
which is utilized by that test. 

2. Power efficiency derivations. The basic notion of the power efficiency of 
a significance test is given in [2], For the present case the problem is to determine 
the value r such that a most powerful test of the same hypothesis (same sig¬ 
nificance level) based on rm and m sample values will have approximately the 
same power function as the given /-test based on m and n sample values (from 
N(ai , <r?) and AT( 02 , < 72 ) respectively). Here the value of is assumed to be 
known. Then the pow er efficiency of the given /-test equals 100r%. 

If the ratio of variances a\la\ is known, the most powerful significance test 
(one-sided and symmetrical) for the difference of means of the two normal 
populations is a /-test w^here the numerator of / is based on the difference of the 



TABLE 1 

Percentage Poioer Efficiencies for Certain Values of m and n 

cc = .05 



4 

6 

10 

15 

20 

30 

50 

100 

00 

4 

79.6 

73.5 

67.2 

63.4 

61.4 

59.3 

57.6 

56.2 

54.9 

6 


86.9 

82.9 

80.2 

78.7 

77.0 

75.5 

74.2 

72.9 

10 


i 

92.6 

90.9 

89.8 

88.6 

87.3 

86.2 

85.0 

15 

i 



95.2 

94.4 

93.5 

92.5 

91.5 

90.3 

20 





96.4 

95.7 

94.9 

94.0 

92.9 

30 






97.7 

97.1 

96.4 

95.3 

50 







98.6 

98.1 

97.2 

100 








99.3 

98.6 

00 









100.0 


a = .01 



6 

8 


15 


m 



00 

6 

74.9 


66.7 

61.2 

57.9 

54.3 

51.1 

48.6 

45.9 

8 


81.3 

78.8 

74.7 

72.1 

69.1 

66.3 

63.9 

61.4 

10 



85.3 

81.9 

79.8 

77.2 

74.7 

72.5 

69.9 

15 





88.9 



83.1 


20 




B 

92.9 

91.4 

89.8 

88.1 

85.8 

30 






95.3 

94.1 

92.8 

90.7 

50 







97.2 

96.3 

94.5 

100 








98.6 

97.3 

00 









100.0 


617 



















618 


JOHN E. WALSH 


two sample means while the denominator is based on the square root of a function 
of the sample values and vXlaX which has a x*-distribution with m n — 2 de¬ 
grees of freedom [1, p. 43]. Thus the problem is that of comparing the power 
functions of two <-tests. 

As stated in section 1, it is sufficient to consider one-sided tests. We find, using 
a modification of the normal approximation to the power function of a one-sided 
<-test given in [3], that Scheff6’s one-sided <-test for the Behrens-Fisher problem 
and the corresponding most powerful one-sided test {a\/a\ known) have approxi¬ 
mately the same power function when r is chosen so that 

Ka - «v^{l - K\/2[{m -f- n)r - 2])*'* = fC. - «[1 - K\/2[rn - l)f*, 

where a is the significance level of the tests, Ka is the value of the standardized 
normalized deviate exceeded with probability a, and 5 is a function of m, n, 
ai, Ot, ffX, ffl and the given hypothetical value of Oi — Oj being tested. This 
condition for the approximate equality of the power functions is reasonably 
accurate for the following cases: a = .05, to > 4; a = .025, to > 5; a = .01, 
TO > 6; a = .005, to > 7. The accuracy of the approximatfon increases as to 
increases. 

Hence a value of r such that the two power functions are approximately equal 
is determined by the equation 

(1) r{l - Kl/2[(m + n)r - 2]} = 1 - Kl/2{m - 1). 

Let 

A = A(m, a) = 1 - Kl/2(Tn - 1). 

Then solving (1) for the appropriate root yields 


1 

2(to -h n) 


{2 -|- (to -|- fiiA -|- K~af2 


+ V[2 + n)A -b Kl/2]* - 8(to -f n)A }. 


Thus the power efficiency of Scheff^’s one-sided West solution to the Behrens- 
Fisher problem, for the case in which the ratio of the variances is also known, is 
approximately equal to 

so 

{2 + (m-fn)A-hA'l/2-|- y/{2A-{m + n)A + K\./2f-B{jnA-n)A\% 


for suitable values of a and m. 


REFERENCES 

[1] Henry Scheff£, solutions of the Behrens-Fisher problem based on the (-distribu¬ 

tion/' AnnaU of Math, Stat,, Vol. 14 (1943), pp. 35-44. 

[2] John E. Walsh, ^^Some significance tests for the median which are valid under very 

general conditions/' AnnaU of Math, Slat, Vol. 20 (1949), pp. 64r-81. 

[3] N. L. Johnson and B. L. Welch, ^'Applications of the non-central (-distribution," 

Biometrika, Vol. 31 (1940), p. 376, 



fisher’s inequality 


619 


A NOTE ON FISHER’S INEQUALirY FOR BALANCED INCOMPLETE 

BLOCK DESIGNS 

By R. C. Bose 

Institvie of Statistics, University of North Carolina 

1. An experimental design in which v varieties or treatments are arranged in 
b blocks, is called a balanced incomplete block design if 

(i) Each block has exactly k treatments (k < v) no treatment occurring twice 
in the same block. 

(ii) Each treatment occurs in exactly r blocks. 

(iii) Any two treatments occur together in exactly X blocks. 

It is easy to see that the parameters v, b, r, k, X of the design satisfy the rela¬ 
tions 

(1.0) bk — vr 

(1.1) X(t; - 1) = rik - 1). 

Also it is readily seen that 

(1.2) r > X 

for otherwise with any given treatment every other treatment would occur in 
every block. This would make k = v, and the design would become a ‘randomised 
block design’. 

Fisher (1940), showed that a necessary condition for the existence of a bal¬ 
anced incomplete block design with v treatments and b blocks is 

(1.3) b^v. 

It is the object of this note to give a very simple proof of Fisher’s inequality. 

2. Consider a balanced incomplete block design with parameters 

(2.0) V, b, r, k, X 

and let 

(2.1) n<y = 1 or 0 

according as the fth treatment does or does not occur in the jth block. Clearly 

(2.2) L n?, = r 

7-1 
b 

2 n<y«<v * h 

7-1 


(25) 


(t 9^ i')- 



620 


ABSTRACTS OF PAPERS 


If possible let 6 < t>. Consider the v X v matrix 


(2.4) 



" nil 

ni2 

• • • nih 

0 

... 0 " 

N = 

n2i 

Un 

• • • 7125 

0 

... 0 


^ rivi 

n„2 

• • • n„h 

0 

... 0 _ 


where the last v — b columns of N consist of zeros. It follows from (2.2) and (2.3) 
that 


(2.5) 



X ••• X 
r • • • X 


X X • • • r 


where N' denotes the transpose of N. 

(2.6) det iNN') = {r + \(v - 1)} (r - X)*“‘ 

But = kr{r — X)*"* from (1.1). 

(2.7) det (JViV') = det N det N' = 0. 

This makes r = X, and contradicts (1.2). Hence the assumption b < vis wrong, 
and we must have 


( 2 . 8 ) 


b > V 


REFERENCES 

[1] R. A. Fish£b, examination of the different possible solutions of a problem in in* 

complete blocks,” Annals of Eugenics, London, Vol. 10 (1940), pp. 52-75. 

[2] F. Yates, ^Incomplete randomised blocks,” Annals of Eugenics, London, Vol. 7 (1936), 

pp. 121-140. 


ABSTRACTS OF PAPERS 

(Presented September 1,1949 at Boulder at the Twelfth Summer Meeting of the Institute) 

1. Structure of Statistical Elements. Duane M. Studley, Foundation Research, 
Colorado Springs, Colorado. 

Research in logical semantics and in practical elementation has set forth the proposition 
that all words and ideas have set form. As a consequence of this universal proposition 
all notions and conceptions in statistics should be accessible to set*theoretic analysis and 
interpretation. This paper explains the results of a preliminary analysis performed on 
statistical notions and conceptions with a view to a proper organization of definitions and 
conceptions which will, it is hoped, make possible a better and simpler construction of 
statistics from a system of basic notions. 




ABSTRACTS OP PAPERS 


621 


2. On the Relative Efficiencies of BAN Estimates. Leo Katz, Michigan State 
College, East Lansing, Michigan. 

J. Neyman, in the Proceedings of the Berkeley Symposium on Mathematical Statistics 
and Probability, 1949, proved that x* minimum estimates with either of two alternative 
definitions of x* are efficient, as also are the maximum likelihood estimates. He also raised 
the question whether some of these estimates were better than others. This paper bears 
on that question. In making x* minimum estimates, it is often necessary to avoid small 
frequencies by grouping together at least one tail of the distribution. It is with respect to 
the parameters of these modified distributions that the x* estimates are efficient. Define 
relative efficiency in these circumstances as the ratio of the variance of an efficient estimator 
in the unmodified case to that of one in the modified case. It is shown that, except for a 
rectangular probability law, the relative efficiency <1 and, further, it decreases as the tail 
grouping is made wider. Formulae are given for the relative efficiencies of x* minimum esti¬ 
mators for Binomial and Poisson probability laws and some representative values com¬ 
puted to exhibit these effects. 


3. Adjustment of an Inverse Matrix Corresponding to Changes in the Elements 
of a Given Column or a Given Row of the Original Matrix. Jack Sherman 
and Winifred J, Morrison, The Texas Company Research Laboratories, 
Beacon, New York. 

A simple computational procedure is derived for obtaining the elements b[, of a nih 
order matrix {B') which is the inverse of (.4'), directly from the elements of a matrix 
(B) which is the inverse of (A), when (A') differs from (A) only in the elements of one col¬ 
umn, say the Bth column. The equations which form the basis of the computation are: 


hs, 


^ bsr CLrS 

f-1 


; « 1, 2, • • • n. 




btj f)57 2 biral-s , 


i » 1, 2, • • • S - 1, B + I, • • • n 
y » 1, 2, • • • n. 


Analogous equations are derived for the case that A and A' differ in the elements of a 
given row rather than a column. 


4. On the Problem of Optimum Classification. Paul G. Hoel, University of 
California at Los Angeles. 


Let /i , (z = 1, 2, • • • , A;), be the probability density function of population i and let 
Pi be the probability that population i will be sampled. Assume certain differentiability 
conditions and moment properties. Then, for known parameters, the probability of a cor¬ 
rect classification will be maximixed by choosing the region M, , which corresponds to clas¬ 
sifying into population i, as that part of variable space where pjt > pji ,0 = 1,2, • * • , k). 
If the parameters are unknown, an asymptotically optimum set of estimates will be 
given by the set that minimizes a certain form in the covariances. Among uncorrelated 
estimates, maximum likelihood estimates are seen to be asymptotically optimum. 

If weight functions, IVi,* , are introduced and the expected value of the loss is minimized, 
the same methods of proof show that the region il/» becomes that part of variable space 

A; 

where S PrfAWr, - Wn) > 0, 0 » L 2, • • • , and that the criterion for an asymptoti- 

r-l 

cally optimum set of estimates is of the same form as the preceding criterion. 



622 


ABSTRACTS OP PAPERS 


5. Optimal Linear Prediction of Stochastic Processes whose Covariances are 
Green's Functions. C. L. Dolph and M. A. Woodbury, University of Michi¬ 
gan, Ann Arbor. 

A method of unbiased, minimal variance, linear prediction is developed for problems 
similar to those of prediction and filtering treated by Wiener. It differs from these in that, 
the unbiased condition is imposed, only a finite part of the past is employed, and no sta¬ 
tionary assumption is used. It is shown that the special stationary case discussed by Cun¬ 
ningham and Hund, ‘‘Random Processes in Problems of Air Warfare^' {Supp, Journal Royal 
Stat. Soc.y 1946) succeeds because the correlation function, well known to that of 

the process defined by the Langevian equation, is the Greenes function of the homogeneous 
differential equation formed by letting the adjoint differential operator of the Langevian 
equation operate on the operator of this equation. This relationship is shown to persist 
for any physically stable linear differential equation driven by “white noise.” The well- 
known equivalence between integral and differential equations is then extended by use of 
Stieltjes integrals and used to effect the solutions of the integral equations of the first kind 
which yield the “optimum” linear prediction. The nonstationary example consisting of 
purely random motion about a mean linear path in the presence of radar type errors is 
treated in detail. 


6. The Integral of the Gaussian Distribution over the Area Bounded by an 
Ellipse. H. H. Germond, RAND Corporation, Santa Monica, California. 

This paper describes the preparation of tables from which to obtain the integral of a 
bivariate Gaussian distribution over the area of an ellipse. The center of the ellipse need 
not coincide with the mean of the Gaussian distribution, nor need the axes of the ellipse 
have any special orientation with respect to the Gaussian distribution. 

7. Theorems on Convergency of Compound Distributions with Symmetric Com¬ 
ponents. (By title) Maria Castellani, University of Kansas City. 

The purpose of this paper is to present some results obtained when operations of convu- 
lution in Ri are concerned with a specific family of distributions. Tho compound distribu¬ 
tion K{x) =» F(z) * G(x) is here obtained combining any d.f. F(x) with a d.f. G(x) under 
the restriction of symmetry, i.e., G'Cx -f /i) -f- G(x — A) » 1 for any ^ > 0. 

A generalization of Cantelli’s Inequalities will enable us to write a preliminary theorem 
on the following upper and lower bounds: 

F(a-h)-2 f dG(y) < K(a) < F(a + h)+2 f dO{y), 

Jh Jh 

K(a-h)-2 f dG(y) < F(a) <K(a + h)-h2 f dG{y), 
h ih 


where a is any point in Ri and h > 0. 

The theorem is derived assuming the Stieltjes Integral, 


K(a) 


j^F(a-y)dG(y), 


is taken as a suni of three integrals connected with three convenient intervals (— « , — h), 
(—A, + When the symmetric component of the convolution is a member of a fam- 



ABSTRACTS OF PAPERS 


62 a 


ily of normal distributions such as Ga(x) 


2 _ 

Vi 


r 

*'—00 


ameter, the use of Cantelli’s Inequalities give 


dy, where is an arbitrary par- 


KJa -h)- Kaia) 



du < F(a) - Kaia) 

< Kaia + h)- Kaia) + ~ f" 


dUf 


where Ka(x) = F{x) * Ga{x). 

The d.f. Ka(x) is a continuous point function in Ri , with a fr. f • 7 (x) which is everywhere 
uniformly continuous. For an arbitrarily small > 0, a convenient small h and large ct 
may be found which will enable us to prove the following two theorems: 

Theorem 1: Given any d.f. F(x) in Ri , there exists a convenient continuous d.f. K„{x) 
which for a —♦ « converges asypmtotically and uniformly almost everywhere to the given 
d.f. F(x). 

Theorem 2: Given any d.f. F(x) in Ri , there exists in any continuity bordered interval 
a convenient uniformly convergent series of continuous functions which asymptotically 
approach the given F(x). 


8. Partial Stuns of the Negative Binomial in Terms of the Incomplete Beta- 
Function. (By title) Julius Lieblein, Statistical Engineering Laboratory, 
National Bureau of Standards. 


In acceptance sampling a certain size sample is taken at random from a lot of items and 
the lot is accepted if the number of defective items do not exceed a predetermined number 
characteristic of the sampling plan. The Statistical Engineering Laboratory has been 
studying the probabilities that a decision to accept or reject can be made before the sample 
is completely inspected. Such probabilities are found to involve certain sums apparently 
not previously treated. In this note the author proves a simple identity connecting these 
sums which greatly facilitates their computation and shows how they may be written in 
terms of the well-known incomplete beta-function of Karl Pearson, for which extensive 
tables are available. 

9 . Large Sample Tests and Confidence Intervals for Mortality Rates. {By title) 
John E. Walsh, RAND Corporation, Santa Monica, California. 

In computing mortality rates from insurance data, the unit of measurement used is fre¬ 
quently based on number of policies or amount of insurance rather than on lives. Then 
the death of one person may result in several units of “death*’ with respect to the investi¬ 
gation; moreover, the number of units per individual may vary noticeably. Thus the usual 
large sample methods of obtaining significance tests and confidence intervals for the true 
value of the mortality rate are not applicable to these situations. If the number of units 
associated with each person in the ivestigation were known, accurate large sample results 
could be obtained; however, determination of the number of units associated with each 
individual would require an extremely large amount of work. This article presents some 
valid large sample tests and confidence intervals for the mortality rate which do not re¬ 
quire much work and are reasonably eflicient. The procedure followed consists in first di¬ 
viding the risks into twenty-six subgroups on the basis of the first letter of the last name 
of the person insured. Some of the groups are then combined until 10 to 15 subgroups 
yielding approximately the same number of units are obtained. The fraction consisting of 
the total number of units paid divided by the total number of units exposed is computed 



624 


NEWS AND NOTICES 


for each subgroup. Asymptotically the resulting observations represent independent ob¬ 
servations from continuous symmetrical populations with common median equal to the 
true value of the rate of mortality. Tests and confidence intervals for the rate of mortal¬ 
ity are obtained by applying the results of the paper *'Some Significance Tests for the 
Median which are Valid Under Very General Conditions*^ {Annals of Math, Stat.y Vol. 20 
(1949), pp. 64-81 to these observations. 


NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Mr. Fred C. Andrews will be a teaching assistant in the Statistical Laboratory, 
Department of Mathematics, University of California for the academic year 
1949-1950. 

Dr. Joseph Berkson has been promoted to the rank of Professor in the Uni¬ 
versity of Minnesota Graduate School and Mayo Foundation. He continues as 
Chief of the Division of Biometry and Medical Statistics of the Mayo Clinic. 

Mr. Colin R. Blyth is now a research assistant at the University of California, 
Statistical Laboratory, Berkeley. 

Mr. Clyde A. Bridger is now Director of the Section of Statistics and State 
Registrar of Vital Statistics for the Division of Health of Missouri. 

Mr. Loren V. Bums, formerly with the MFA Milling Company at Springfield, 
Missouri, has been made Vice-President and Technical Director of the Spear 
Mills, Inc., Kansas City 6, Missouri. 

Professor Douglas Chapman, who obtained his Ph.D. in statistics at the Uni¬ 
versity of California, Berkeley, has accepted an appointment as Assistant Pro¬ 
fessor at the University of Washington in the Department of Mathematics and 
the Laboratory of Statistical Research. 

Dr. Andrew Laurence Comrey, who received his doctor’s degree from the Uni¬ 
versity of Southern California last June, has accepted an assistant professorship 
in the Department of Psychology at the University of Illinois. 

Dr. Donald A. Darling has been appointed to an instructorship in the Depart¬ 
ment of Mathematics, University of Michigan. 

Dr. Paul M. Densen resigned his position as Chief of the Division of Medical 
Research Statistics of the Department of Medicine and Surgery of the Veterans 
Association as of July 1, 1949 to join the staff of the Graduate School of Public 
Health, University of Pittsburgh, as an Associate Professor of Biostatistics. 

Mr. Amron H. Katz has been promoted to the position of Chief Physicist of 
the Photographic Laboratory, Engineering Division, Air Material Command, 
Wright Patterson Air Force Base, Dayton, Ohio. 

Associate Professor Louis Guttmann, who had been on leave for two years from 
the Department of Sociology of Cornell University conducting a research pro¬ 
gram in Israel, was invited to remain in Israel for another year to direct the 



NEWS AND NOTICES 


625 


activities of the recently founded Israel Institute of Public Opinion Research. 
He is serving as Chief Consultant. 

Mr. Heme Ernest LaFontant who was attending the University of Michigan 
during the academic year 1948-1949 working on his doctor^s degree, has accepted 
a position as statistician for the B.T.W. Insurance Co. at Birmingham, Alabama. 

Assistant Professor Jerome C. R. Li has been promoted to Associate Professor 
of Mathematics at the Oregon State College, Corvallis, Oregon. 

Professor H. B. Mann of Ohio State University has accepted a visiting 
professorship and research associateship at the Statistical Laboratory at 
Berkeley, California for the year 1949-1950. 

Dr. Gottfried E. Noether has been appointed to an instructorship at New York 
University. 

Dr. G. R. Seth has just returned from a trip to England, Sweden, France and 
India where he visited statistical institutions. 

Assistant Professor Andrew Sobczyk has been promoted to Associate Professor 
of Mathematics at Boston University. 

Dr. Zenon Szatrowski, formerly with the Economics Department of North¬ 
western University, has accepted an associate professorship in the School of 
Business Administration, University of Buffalo. 

Professor Gerhard Tintner has returned to his teaching and research duties at 
Iowa State College after spending a year at the Department of Applied Eco¬ 
nomics at the University of Cambridge, England. He gave a course on Econ¬ 
ometrics at the University of Cambridge and during his stay in Europe, he 
lectured on econometric and statistical subjects in Universities at Bristol, Dublin, 
Hull, Paris, Manchester and Uppsala. 

Dr. A. E. R. Westman, Director of the Department of Chemistry, Ontario 
Research Foundation, left in September, 1949 for England where he is visiting 
industrial research laboratories and engaging in studies in the Department of 
Physical Chemistry, Cambridge University. He plans to return in June, 1950. 


Word has just been received here of the formation of the New Zealand Statisti¬ 
cal Association. The initial meeting was held in August, 1948 at Victoria Uni¬ 
versity College. The officers are: J. T. Campbell, President; I. D. Dick, Secretary. 
It is planned to hold one formal meeting a year at first with the hope of mcreasing 
this later. The main interest in statistical work in New Zealand has been bio¬ 
logical, but there is scope for considerable extension to industrial, educational and 
economic fields and it is hoped the formation of the Association will assist in this 
extension. _ 

New Members 

The following persons have been elected to Membership in the Institute 
{June 1, 1949 to August 22, 1949) 

Al-Doorl, Younis A., Student at the University of California, 1916 Henry Street, Berkeley, 
California, 



6 ^ 


NEWS AND NOTICES 


Blither, Robert A., A.B. (Univ. of Calif.) S-18 Richmond Terrace^ Richmond^ California, 

Bula, Clotilde Angelica, Ph.D., (Univ. of Rosario, Argentina) Professor, University of 
Buenos Aires, RiojaS681 , Olivoa-Pcia. de Buenoa Airea, Argentina, 

Dalzielt Edwin R., Ph.D. (Univ., Edinburgh) Assistant Master, Palmerston North Techni¬ 
cal School, Palmerston North, New Zealand. 

Douglas, James B., Dip. Ed. (Melbourn Univ.) Lecturer in Mathematics, Newcastle Tech¬ 
nical College, Tighe's Hill 2N, N.S.W., Australia. 

Hartley, Herman O., Ph.D. (Cambridge Univ.) Lecturer in Statistics, Department of 
,. Statistics, University College, London, W.C.l, England. 

Inmel, Eric R., M. A. (Queen’s Univ., Kingston, Canada) Teaching Assistant and Graduate 
Student, Department of Mathematics, University of California at Los Angeles, Los 
Angeles, California. 

Kelly, John P., Senior Technical Engineer, Carbide and Carbon Chemical Corporation, 
Oak Ridge, Tennessee, P.O. Box 47S, Norriay Tenneaaee. 

Pgrel* Cristina P., M.S. (Univ. of Michigan) Instructor, Department of Mathematics, 
University of the Philippines, Manila, P.I. 

Philipson, Carl O., D.Sc. (Univ. of Stockholm) Actuary of Folket-Samarbete, Yngvevagen 
5, Djuraholmy Sweden, 

Porter, Robert A., Ph.D. (N.C. State College, Raleigh, N.C.) Senior Mathematician, Uni¬ 
versity of Chicago, 17113 Longfellow Avenue^ Homewood^ Illinoia, , 

Rippe, Dayle D., M.A. (Univ. of Nebr.) Student, Teaching Fellow, Department of Mathe¬ 
matics, University of Michigan, lOJ^B WohurnCo^irl^ Willow Run^ Michigan. 

Rogers, Robert L., A.B. (Univ. of Calif.) Student at University of California, Route 3, 
Box74^ Denio Avenue ^ Gilroy ^California. 

Roy, Samarendra N., M.Sc. (Calcutta Univ.) Head of Department of Statistics, Calcutta 
University and Assistant Director, Indian Statistical Institute (now on leave) P,0, 
Box 168f Chapel Hill, North Carolina. 

Savey, Rosemary, M.B.A. (Univ. of Wise.) Graduate Assistant and Student, University of 
Wisconsin, 3S1S Norwood Place, Madiaon 6, Wiaconain. 


REPORT ON THE BOULDER MEETING OF THE INSTITUTE 

The Twelfth Summer Meeting of the Institute of Mathematical Statistics 
was held at the University of Colorado, Boulder, Colorado, Monday, August 29 
through Thursday, September 1, 1949. The meeting was held in conjunction 
with the summer meetings of the American Mathematical Society, the Mathe¬ 
matical Association of America, and the Econometric Society. The meeting was 
attended by the following 79 members of the Institute: 

S. P. Agarwal, R. L. Anderson, T. W. Anderson, V. L. Anderson, K. J. Arnold, E. W. 
Barankin, C. A. Bennett, Agnes Berger, E. E. Blanche, A. H. Bowker, J. C. Brixey, Jean 
Bronfenbrenner, J. H. Bushey, H. C. Carver, Herman Chernoff, K. L. Chung, A. G. Clark, 
E. P. Coleman, E. L. Crow, J. H. Curtiss, W. J. Dixon, J. L. Doob, Aryeh Dvoretzky, 
H. P. Evans, W. D. Evans, W. T. Federer, William Feller, C. H. Fischer, J. S. Frame, T. C. 
Pry, H. M. Gehman, H. H. Germond, R. E. Greenwood, H. T. Guard, P. R. Halmos, J. L. 
Hodges, P. G. Hoel, Harold Hotelling, J. M. Howell, C. C. Hurd, C. A. Hutchinson, Irving 
Kaplansky, Leo Katz, H. S. Konijn, T. C. Koopmans, G. M. Kuznets, H. D. Larsen, D. H. 
Jjeavens, S. B. Littauer, H. B. Mann, Jacob Marschak, F. J. Massye, Dorothy J. Morrow, 
Jerzy Neyman, M. L. Norden, J. I. Northam, E. G. Olds, R. P. Peterson, G. B. Price, 
Mina S. Rees, P. R. Rider, F. D. Rigby, Herman Rubin, L. J. Savage, Elizabeth R. Scott, 



KBPOBT ON THE BOULDER MEETING 


627 


I. E. Segal, Esther Seiden, Jack Sherman, W. B. Simpson, Milton Sobel, D. M. Studley, 
B. R. Suydam, A. G. Swanson, James Templeton, R. M. Thrall, J. W. Tukey, Abraham 
Wald, John Wishart, S. S. Wilks. 

The Monday afternoon session was devoted to invited addresses with Pro¬ 
fessor Leonard J. Savage of the University of Chicago presiding. The attendance 
was approximately fifty. Professor J. L. Hodges of the University of California 
presented a paper, Some Problems in Point Estimation, and Professor W. T. 
Federer of Cornell University presented A Comparison of the Proportionality of 
Covariance Matrices. 

On Tuesday Morning the Institute, the Mathematical Association of America, 
and the Econometric Society held a joint s 3 rmposium on Mathematical Training 
for Social ScierUists. Professor Jacob Marschak of the Cowles Commission for 
Research in Economics presided. The attendance was approximately one hundred 
fifty. The participating speakers were: Professor R. L. Anderson of North 
Carolina State College; Professor T. W. Anderson of Columbia University; 
Professor G. C. Evans of the University of California; Professor F. L. Griffin 
of Reed College; Professor Harold Gulliksen of Educational Testing Service; 
Professor William Jaff6 of Northwestern University; Professor Harold Hotelling 
of the University of North Carolina; and Professor G. M. Kuznets of the Uni¬ 
versity of California. At the end of the session the following resolution was 
adopted by those in attendance at the meeting: 

Members of the Mathematical Association of America, the Institute of Mathematical 
Statistics, and the Econometric Society assembled in a joint session in Boulder, Colorado, 
on August 30, 1949, are of the opinion that officers of these societies should study the 
need for better mathematical training of social scientists, and the ways and meaiuf to 
improve mathematical preparation of social scientists, and that such a study may be 
most effectively conducted by a joint committee, possibly in co-operation with other 
interested societies, and in close touch with the Social Science Research Council, the 
National Research Council, or other national bodies concerned with general education 
and research. It is suggested that this committee report the results of its deliberations 
at the next joint meeting of the original participating societies. 

The two joint sessions of the Institute and the Econometric Society were 
devoted to a Symposium on SkUistical Inference in Decision Making. Professor 
Jerzy Neyman of the University of California presided on Tuesday afternoon. 
The attendance was approximately eighty. Professor Aryeh Dvoretzky <rf 
Hebrew University, Jerusalem presented Decision Problems and Professor Abra¬ 
ham Wald of Columbia University presented Some Recent Results in the Theory 
of Statistical Decision Functions. On Wednesday Morning, under the chairman¬ 
ship of Professor Wald and an attendance of approximately seventy-five, the 
following papers were presented: Remarks on a Raiional Selection of a Decision 
Function by Professor Herman Chemoff of the Cowles Commission for Resemvh 
in Economics; Psychological Probabilities by Professor Leonard J. Savage; and 
Complete Classes of Decision Functions for Some Standard Sequential and Non¬ 
sequential Problems by Milton Sobel of Columbia University. 



628 


B SPORT ON THE BOULDER MEETING 


On Thursday Morning the Institute and the American Mathematical Society 
held a joint session for contributed papers with Professor P. R. Rider of Wash¬ 
ington University presiding. The attendance was approximately seventy-five. 
The following papers were presented: 

1. Structure of Statistical Elements. 

Mr. Duane M. Studley, Foundation Research, Colorado Springs. 

2. On the Relative Efficiencies of BAN Estimates. 

Professor Leo Katz, Michigan State College. 

3. Adjustments of an Inverse Matrix Corresponding to Changes in the Elements of a Given 
Column or a Given Row of the Original Matrix. 

Dr. Jack Sherman and Miss Winifred J. Morrison, The Texas Company Research 
Laboratories, Beacon, New York. 

4. On the Problem of Optimum Classification. 

Professor Paul G. Hoel, University of California at Los Angeles. 

5. Optimal Linear Prediction of Stochastic Processes whose Covariances arc Greenes Func^ 
tions. 

Profe^ssor C. L. Dolph and Dr. M. A. Woodbury, University of Michigan. 

6. The Integral of the Gaussian Distribution over the Area Bounded by an Ellipse. 

Dr. H. H. Germond, Rand Corporation, Santa Monica, California. 

7. Theorems on Convergency of Compound Distributions with Symmetric Components. 
(By title) 

Dr. Maria Castellani, University of Kansas City. 

8. Large Sample Tests and Confidence Intervals for Mortality Rates. (By title) 

Dr. J. E. Walsh, Rand Corporation, Santa Monica, California. 

9. Partial Sums of the Negative Binomial in Terms of the Incomplete Beta-function. (Bv 
title) 

Dr. Julius Lieblein, National Bureau of Standard?.. 

On Thursday afternoon Professor Jerzy Neyman presented the Second Rietz 
Memorial Lecture on Consistent Estimates of the Linear Structural Relation in 
0^ General Case of Identifiahility. Professor Harold Hotelling presided and the 
attendance was approximately fifty. Dr. R. P. Boas, Jr. of Mathematical Reviews 
presented an invited address The Representation of Probability Distributions by 
Charlier Series. 

The Institute sponsored a beer party on Tuesday evening and on Thursday 
evening a fry was held on Flagstaff Mountain. 

Harris T. Guard 
Assistant Secretary 



JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 

JUNE 1949 
Articles 

The Current Status of State and Local Population Estimates in the Census 

Bureau. Henry S. Shryock, Jr., and Norman Lawrence 

The Uses and Usefulness of Binomial Probability Paper. 

Frederick Mosteller and John W. Tdkey 
Teaching Statistical Quality Confrol for Town and Gown 

Edwin G. Olds and Lix)yD A. Knowler 

The Use of Sampling in Great Britain.C. A. Moser 

Unemployment and Migration in the Depression (1930-1936) 

Ronald Freedman and Amos H. Hawley 
Miniinum X* and Maximum Likelihood Solution in Terms of a Linear Transform, 

with Particular Reference to Bio-Assay. Joseph Berkbon 

Some Inadequacies of the Federal Censuses of Agriculture Raymond J. Jebsen 

The Edge Marking of Statistical Cards. A. M. Lester 

Conrad Alexander Verrijn Stuart (1865-1948). Walter F. Willcox 

Proceedings of the 108th Annual Meeting. 

Book Reviews 

AMERICAN STATISTICAL ASSOCIATION 
1603 K Street, N. W., Washington 6, D. C. 


MATHEMATICAL REVIEWS 

A journal containing reviews of the mathematical liter- 
ature of the world, with full subject and author indices 

Publication of this journal is sponsored by the American Mathe¬ 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others. 

Subscriptions accepted to cover the calendar year only. 

Issues appear monthly except July. $20.00 per year. 

Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
531 West 116th Street, New York City 27 








SKANDINAVISK 

AKTUARIEUDSKRIFT 

1948 - Parts 3-4 

Contents 

E. Kivikoski: Uber die Konvergenz des Iterationsverfahrens bei der Berech- 

nung des efifektiven Zinsfusses. 136 

W. SiMONSEN : On Divided Differences and Osculatory Interpolation. 157 

Ernst Zwinooi: Initiation of a Formula for Approximate Valuation of Pre< 

miums for Disability Benefits. 166 

H. Ammeter: A Generalisation of the Collective Theory of Risk in Regard 

to Fluctuating Basic-Probabilities. 171 

Trygowe Sax^n: On the Probability of Ruin in the Collective Risk Theory 

for Insurance Enterprises with only Negative Risk Sums. 199 

H. Wold: On Stationary Point Processes and Markov Chains. 229 


Annual subscription: 10 Swedish Crowns (Approx. $2.0D). 
Inquiries and orders may be addressed to the Editor, 

SKARVIKSVAGEN 7, DJURSHOLM (SWEDEN) 


BIOMETRIKA 

A Journal for the Statistical Study of Biological Problems 

Volume XXXVI Contents Parts I and II, June 1949 

I. The infectiousness of measles. By MAJOR GREENWOOD. II. A note on the 
analysis of grouped probit data. By K. D. TOCHER. III. A generalization of 
Po^on's binomial limit for use in ecology. By MARJORIE THOMAS. IV. The 
estimation and comparison of residual re^ssions where there are two or more related 
sets of observations. By A. H. CARTER. V. Cumulants of multivariate multi¬ 
nomial distributions. By JOHN WISHART. VI. On the Wishart distribution in 
statistics. By A. C. AITKEN. VII. The spectral theory of discrete stochastic 
processes. By P. A. P. MORAN. VIII. On a property of distributions admitting 
sufficient statistics. By V. S. HUZURBAZAR. IX. On a method of trend elimi¬ 
nation. By M. H. QUENOUILLE. X. On the estimation of dispersion by linear 
systematic statistics. By H. J. GODWIN. XI. On the reconciliation of theories 
of probability. By M. G. KENDALL. XII. The derivation and partition of x* in 
certain discrete distributions. By H. O. LANCASTER. XIII. A note on the 
subdivision of x* into components. By J. O. IRWIN. XIV. The first and second 
moments of some probability distributions arising from points on a lattice and their 
application. By P. V. KRISHNA IYER. XV. Probability Tables for the range. 
By E. J. GUMBEL. XVI. Systems of frequency curves generated by methods of 
translation. By N. L. JOHNSON. XVII. Rank and product-moment correlation. 
By M. G. KENDALL. XVIII. Tests of significance in harmonic analvsis. By H. 
O. HARTLEY. XIX. The non-central X-' and F-distributions and their applica¬ 
tions. By P. B. PATNAIK. XX. MISCELLANEA: On a method of estimating 
frequencies. By D. J. FINNEY. A further note on the mean deviation from the 
median. By K. R. NAIR. REVIEWS: Theory of probability and Karl Pearson’s 
early statistical papers. 

The eubsoription price, payahle tn advance, is 46s. inland, 54s. export (per volume including postage). Cheques 
should be drawn to Biometrika and sent to **The Secretary, Biometrika Office, Department of StatistioB, 
University College, London, W.C. 1." All foreign cheques must be in sterling and drawn on a bank 
having a London agency. 










SANKHYS 

The Indian Journal of Statistics 

Edited by F. C. Mahalanobis 


Vol. IX, Parts 2 and 3, 1949 

Part I, The Field Survey. D. N. Majumdar 

Chapter 1. Previous work in India on physical anthropometry 
Chapter 2. Collection of anthropometnc data in the 1941 survey 

Part II, Statistical Analysis.P. C. Mahalanobis & C. R. Rao 

Chapter 3. Arrangements for statistical analysis 
Chapter 4. Basic statistical concepts 
Chapter 5. Normality of frequency distributions 
Chapter 6. Caste and tribal differences 

Part III, Anthropolo^cal Observations.P. C. Mahalanobis 

Chapter 8. Physical appearance in relation to ethnological evidence 
Supplement: Ethnological notes 


Annual subscription: 30 rupees 
Inquiries and orders may be addressed to the 
Editor, Sankhyfi, Presidency College, Calcutta, India. 


ECONOMETRICA 

Journal of the Econometric Society 
Contents of Vol. 17^ No. 2, Aprils 1949j include: 

Page 

Colin Clark: A System of Equations Explaining the United States Trade 
Cycle, 1921 to 1941. 93 

Tjalling C. Koopmans: Identification Problems in Economic Model Construc¬ 
tion . 125 

Lawrence R. Klein : A Scheme of International Compensation. 145 

M. H. Ekker: A Scheme of International Compensation: Postscript. 150 

ANNOUNCEMENTS, NOTES, AND MEMORANDA 


Published Quarterly Subscription to Nonmembers: $9.00 per year 

The Elooiiometrio Sooiety it an international society for the adranoement of eoonomio theory in its 
relation to statisties and mathematios. 

Subscriptions to Bcanomdrtea and inquiries about the work of the Sooiety and the proeedure in 
applying for membership should be addressed to Alfred Cowles, Secretary and Treasurer, The £oon< 
ometrio Sooiety, The Unlrersity of Chicago, Chicago 37, Illinois, U.8.A. 












